A double SIMEX approach for bivariate randomeffects metaanalysis of diagnostic accuracy studies
 Annamaria Guolo^{1}Email authorView ORCID ID profile
DOI: 10.1186/s1287401602842
© The Author(s) 2017
Received: 18 August 2016
Accepted: 22 December 2016
Published: 11 January 2017
Abstract
Background
Bivariate randomeffects models represent a widely accepted and recommended approach for metaanalysis of test accuracy studies. Standard likelihood methods routinely used for inference are prone to several drawbacks. Small sample size can give rise to unreliable inferential conclusions and convergence issues make the approach unappealing. This paper suggests a different methodology to address such difficulties.
Methods
A SIMEX methodology is proposed. The method is a simulationbased technique originally developed as a correction strategy within the measurement error literature. It suits the metaanalysis framework as the diagnostic accuracy measures provided by each study are prone to measurement error. SIMEX can be straightforwardly adapted to cover different measurement error structures and to deal with covariates. The effortless implementation with standard software is an interesting feature of the method.
Results
Extensive simulation studies highlight the improvement provided by SIMEX over likelihood approach in terms of empirical coverage probabilities of confidence intervals under different scenarios, independently of the sample size and the values of the correlation between sensitivity and specificity. A remarkable amelioration is obtained in case of deviations from the normality assumption for the randomeffects distribution. From a computational point of view, the application of SIMEX is shown to be neither involved nor subject to the convergence issues affecting likelihoodbased alternatives. Application of the method to a diagnostic review of the performance of transesophageal echocardiography for assessing ascending aorta atherosclerosis enables overcoming limitations of the likelihood procedure.
Conclusions
The SIMEX methodology represents an interesting alternative to likelihoodbased procedures for inference in metaanalysis of diagnostic accuracy studies. The approach can provide more accurate inferential conclusions, while avoiding convergence failure and numerical instabilities. The application of the method in the R programming language is possible through the code which is made available and illustrated using the real data example.
Keywords
Bivariate metaanalysis Diagnostic test Likelihood inference Measurement error SIMEXBackground
Metaanalysis of diagnostic studies is a widely accepted approach for the assessment of the accuracy of a diagnostic test in distinguishing between diseased and nondiseased patients. A diagnostic study is commonly evaluated in terms of sensitivity, i.e., the conditional probability of testing positive in diseased subjects, and specificity, i.e., the conditional probability of testing negative in nondiseased subjects. Alternatively, the information about a diagnostic test is available as a twobytwo table of agreement between the test results and the reference standard test results [1].
The interest in metaanalysis of diagnostic accuracy studies has increased over recent years. Preliminary approaches based on separate univariate metaanalyses for sensitivity and specificity of diagnostic tests, although still diffuse in medical investigations, have been successfully improved by more sophisticated solutions accounting for the correlation between the diagnostic test measures [2–4]. The literature, initially based on least squares regressions [5, 6], now spans hierarchical models [4, 7–9], bivariate copula distributions [10–12], bivariate mixture models [13, 14], nonparametric solutions [15]. In this paper we focus on the bivariate randomeffects model [7, 8], as it is currently a wellestablished and recommended method for metaanalysis of diagnostic accuracy studies. The bivariate randomeffects approach has a hierarchical structure accounting for the withinstudy sampling variability and for the betweenstudy variability arising from differences derived, for example, from patients’ characteristics. Moreover, it considers the presence of measurement error affecting the sample estimation of sensitivity and specificity. These characteristics represent a substantial step ahead with respect to the original approach of Littenberg and Moses [5, 6] to construct a summary receiver operating characteristic (SROC) curve based on the regression of the difference between sensitivity and specificity on their sum, a solution criticised has a source of unreliable inferential conclusions [8]. Likelihood inference has to deal with issues of considerable interest [16–19]: small sample size is known to affect the accuracy of the inferential results; nonconvergence of the optimisation algorithms can occur, with nonpositive definite variance/covariance matrix or unreliable parameter estimates typically on the boundary of the parameter space; computational issues, such as numerical integration, may represent further complications to deal with.
This paper investigates the applicability of SIMEX (simulation extrapolation) as an alternative way for metaanalysis of diagnostic accuracy studies. SIMEX is a simulationbased technique developed within the measurement error literature [20, 21] that found a wide applicability in many areas of research, given the simplicity of the idea underlying the approach and the straightforward implementation with standard software. The performance of SIMEX for inference on the bivariate randomeffects model components as well as on the diagnostic accuracy measures is compared to the likelihood approach through an extensive simulation study covering different scenarios, with varying sample size and betweenstudy correlation. Attention is paid to the robustness of the competing methods against model misspecification, in particular deviations from the typical assumption of joint normal distribution for the random effects [9], as well as to nonconvergence problems and numerical instabilities. In addition, SIMEX is applied to the metaanalysis of transesophageal echocardiography recently used in literature [15] to highlight the limitations of the likelihoodbased inference.
Methods
Bivariate randomeffects formulation for metaanalysis
Consider a metaanalysis of n diagnostic accuracy studies, each of them providing information as a twobytwo table reporting the number of true positives, true negatives, false positives and false negatives, denoted by n _{11i },n _{00i },n _{10i } and n _{01i }, respectively. Let n _{1i } be the number of total positives and n _{0i } the number of total negatives. Consider the sensitivity (SE _{ i }) and the specificity (SP _{ i }) as diagnostic accuracy measures of study i,i=1,…,n. Keeping with much of the literature, the accuracy can be expressed using the logit transformation, η _{ i }=logit(SE _{ i }) and ξ _{ i }=logit(1−SP _{ i }). Given the twobytwo table information, the estimates of SE _{ i } and SP _{ i } in study i are n _{11i }/n _{1i } and n _{00i }/n _{0i }, respectively. Hereafter, the estimates of η _{ i } and ξ _{ i } will be denoted by \(\hat {\eta }_{i}\) and \(\hat {\xi }_{i}\), respectively.
Models
where \(\overline {\eta }\) and \(\overline {\xi }\) are the means over the studies, \(\sigma ^{2}_{\eta }\) and \(\sigma ^{2}_{\xi }\) denote the betweenstudy variances and ρ is the correlation coefficient. As sensitivity SE _{ i } and specificity SP _{ i } tend to be negatively correlated, then η _{ i } and ξ _{ i } tend to be positively correlated, so that ρ>0.
We refer to (1) and (3) as the BinomialNormal approach [16]. The resulting model is a generalised linear model, with no closedform expression for the associated likelihood function. More computational effort is required with respect to the approximate model, as numerical integration is needed. Convergence problems represent a further drawback of the approach, with the risk of nonpositive definite variance/covariance matrix and unreliable estimates of the parameters of the variance/covariance matrix truncated on the boundary of the parameter space [9, 16, 19]. Both the practical issues are more severe as the number of studies decreases. The NormalNormal approach is prone to some criticism as well, despite its feasible application. Inferential conclusions can be biased as a consequence of small sample size or values of sensitivity and specificity close to 1 [4, 24]. When the sample size is large, instead, there are no substantial differences between the two approaches.
Parameter estimation is typically performed via maximum likelihood or restricted maximum likelihood [8]. The estimates of sensitivity and specificity are obtained by backtransforming the estimates of \(\overline {\eta }\) and \(\overline {\xi }\), with standard errors derived using the delta method. Alternative measures of test accuracy are the positive likelihood ratio LR+=SE/(1−SP), the negative likelihood ratio LR−=(1−SE)/SP and the diagnostic odds ratio dOR={SE/(1−SE)}×{SP/(1−SP)}. A description of the diagnostic test can be also provided by the SROC curve, through i) the characterisation of the bivariate normal model via an appropriate line and ii) the transformation of the line to the SROC space. See Arends et al. [8] for alternative specifications for the SROC curves. Discussion about the interpretation of the resulting SROC curve can be found in Hamza et al. [4] and references therein.
The measurement error problem
The hierarchical model defined for metaanalysis of diagnostic accuracy studies is an instance of the more general bivariate metaanalysis investigated by van Houwelingen et al. [25], among others. Control rate regression [26, 27], defined as the relationship between the treatment effect and the baseline risk in metaanalysis of clinical trials, perfectly fits the scenario we focus on in this paper. In control rate regression, attention is paid to the risk of inaccurate inferential conclusions due to the presence of measurement error [28, 29] affecting the treatment effect and the baseline risk measures. Different proposals have been suggested to face the measurement error problem [30–32]. Similarly, in metaanalysis of diagnostic accuracy studies the observed \(\hat {\eta }_{i}\) and \(\hat {\xi }_{i}\) are estimates of the true unknown η _{ i } and ξ _{ i } and thus they are prone to some kind of mismeasure. Not accounting for measurement error can result in misleading inference, the most frequent being a biased estimate of the slope of the regression line used to define the SROC curve, an effect known as attenuation. See, for example, the discussion in Arends et al. [8]. The likelihood approach based of the hierarchical model given by (1)–(2) or (1)–(3) properly accounts for measurement errors [8, 26]. The withinstudy model (2) or (3), in fact, defines a relationship between the observed errorprone \(\hat {\eta }_{i}\) and \(\hat {\xi }_{i}\) and the unobserved corresponding η _{ i } and ξ _{ i }, in this way including the uncertainty related to the measurement process.
Despite the above mentioned analogies, control rate regression and diagnostic accuracy studies differ with respect to the role played by the variables in the regression model. In control rate regression, the baseline risk information is the covariate for the regression model using the treatment effect as response variable. In diagnostic accuracy studies, the roles of η _{ i } and ξ _{ i } in terms of response variable and covariate are not undoubtedly defined, as specificity and sensitivity can act as response or covariate according to the regression model chosen to define a particular SROC curve. Only when a specific regression line used for drawing the SROC curve is defined [8], then the role of response or covariate is clearly stated.
Double SIMEX approach
SIMEX is a simulationbased technique for measurement error correction [20, 21, 33]. The method, originally developed to deal with classical additive errors affecting continuous variables, can be easily extended to all the scenarios where measurement error structures can be simulated. This requires the measurement error variance to be known or at least accurately approximated. SIMEX consists of a simulation step followed by an extrapolation step. In the first step, a resamplinglike strategy simulates B datasets of additional errors with increasing variance, each of them used to estimate the parameters of interest. In the second step, the relationship between the estimates and the amount of the added measurement error is determined and used to extrapolate the corrected estimate back to the no measurement error case. The simulation step typically requires the generation of independent random variables, while the estimation can be carried out using standard simple procedures, as the least squares estimation or the method of moments. The extrapolation step is a straightforward procedure. The feasibility of SIMEX application with standard software is the most attractive feature explaining its wide diffusion in many areas of research. Although the approach typically considers one or more mismeasured covariates, SIMEX can easily handle situations with measurement errors on both the response and covariates, see Holcomb [34], who termed the resulting method double SIMEX. This case perfectly fits the bivariate metaanalysis problem we focus on in this paper, as \(\hat {\eta }_{i}\) and \(\hat {\xi }_{i}\) are both affected by measurement error. In this way, SIMEX for diagnostic test accuracy can be thought of as an extension of Guolo [32], who investigated the methodology in control rate regression.
Let \( {s}^{2}_{b}(\lambda)\) be the estimated variance/covariance matrix of \({\hat {\theta }}(\lambda)\) and let \( {s}^{2}(\lambda)=B^{1}\sum ^{B}_{b=1} {s}^{2}_{b}(\lambda)\). Given the sample variance/covariance matrix \( {s}^{2}_{\Delta }(\lambda)\) of \( {\hat {\theta }}_{b}(\lambda)\), the variance/covariance matrix of the SIMEX estimator is obtained by extrapolating back the relationship between \( {s}^{2}(\lambda)  {s}^{2}_{\Delta }(\lambda)\) and λ to the case λ=−1, see Stefanski and Cook [21] and Appendix B.4 in Carroll et al. [28].
Simulation studies
Several simulation studies have been conducted to investigate the performance of SIMEX and compare it to the NormalNormal and the BinomialNormal likelihood approaches. Data simulation follows a twostage procedure. In the first stage, values for η _{ i } and ξ _{ i } are generated according relationship (1) or substituting the normal distribution with a t distribution with four degrees of freedom, e.g. [9], or a skewnormal [35] distribution. In the last two cases, the robustness of the results is investigated with respect to departures from the common normality assumption for the random effects, which may sometimes not be appropriate [9]. The chosen skewnormal distribution is such that the mean and the variance correspond to those for the normal case, but the skewness parameter for (η _{ i },ξ _{ i })^{⊤} is increased from (0,0)^{⊤} (the normal case) to (−1.0,0.5)^{⊤} and to (−2,2)^{⊤}. In the second stage, the setup is inspired by the studies in Hamza et al. [16] and Diaz [18]. The withinstudy numbers of true positives and false positives are simulated using relationship (3). The numbers of diseased subjects n _{1i } and nondiseased subjects n _{0i } are generated from a uniform variable on [ 40,200]. The number of studies n varies in {10;25} in order to evaluate the methods in case of small to moderate sample sizes. Scenarios with decreasing accuracy are considered, namely, high accuracy \((\overline {\eta },\overline {\xi })^{\top }=(2.94, 2.20)^{\top }\), medium accuracy \((\overline {\eta },\overline {\xi })^{\top }=(1.39, 1.50)^{\top }\) and low accuracy \((\overline {\eta },\overline {\xi })^{\top }=(0.62, 0.85)^{\top }\). Accordingly, (SE _{ i },SP _{ i })^{⊤}=(0.95,0.90)^{⊤},(SE _{ i },SP _{ i })^{⊤}=(0.80,0.82)^{⊤} and (SE _{ i },SP _{ i })^{⊤}=(0.65,0.70)^{⊤},i=1,…,n. Increasing correlation between η _{ i } and ξ _{ i } is considered, ρ∈{0.2;0.6;0.8}. Betweenstudy variances \(\sigma ^{2}_{\eta }\) and \(\sigma ^{2}_{\xi }\) are fixed equal to 1.2 and 0.5, respectively. One thousand datasets are generated for each combination of sample size, correlation and values of \((\overline {\eta },\overline {\xi })^{\top }\).
The integrals in the BinomialNormal approach are approximated via a GaussHermite procedure with 100 quadrature points. Inference in the NormalNormal model is carried out using the restricted maximum likelihood, while inference in the BinomialNormal model uses the maximum likelihood estimation. Likelihood maximisation, based on the Nelder and Mead algorithm [36], employs the method of moments estimates as starting values. SIMEX considers B=100 remeasured data generated using the GramSchmid process, λ assuming values in Λ={0.0,0.5,1.0,1.5,2.0} and the quadratic extrapolation function. Parameter estimation within the simulation step is based on model (2). All the methods are implemented in the R programming language [37].
Methods are compared with respect to bias and estimate of standard error of the estimators of the parameters \(\overline {\eta }, \overline {\xi }, \sigma ^{2}_{\eta }, \sigma ^{2}_{\xi }, \rho \) and in terms of the 95% confidence interval for the estimators of the measures of diagnostic accuracy given by the diagnostic odds ratio dOR, the positive likelihood ratio LR+ and the negative likelihood ratio LR−. The performance of the methods in terms of convergence problems is investigated as well. Successful convergence is intended as meeting the criterion convergence (e.g., difference between current and updated estimates less than 0.0001) and positive definite variance/covariance matrix. The results under nonconvergence are excluded when summarising the simulation results.
Results
Simulation results
Simulation results for the high accuracy scenario
Randomeffects  ρ  \(\overline {\eta }\)  \(\overline {\xi }\)  \(\sigma ^{2}_{\eta }\)  \(\sigma ^{2}_{\xi }\)  ρ  Failure 

distribution  bias (s.e.)  bias (s.e.)  bias (s.e.)  bias (s.e.)  bias (s.e.)  rate %  
NormalNormal  
Normal  0.2  0.19 (0.33)  0.06 (0.23)  0.47 (0.46)  0.13 (0.23)  0.04 (0.27)  2.5 
0.6  0.16 (0.33)  0.03 (0.23)  0.42 (0.47)  0.12 (0.23)  0.15 (0.22)  1.8  
0.8  0.17 (0.34)  0.02 (0.23)  0.38 (0.48)  0.10 (0.24)  0.19 (0.18)  1.8  
BinomialNormal  
0.2  0.02 (0.37)  0.01 (0.24)  0.10 (0.67)  0.03 (0.28)  0.02 (0.36)  6.2  
0.6  0.02 (0.38)  0.01 (0.24)  0.03 (0.71)  0.02 (0.28)  0.01 (0.28)  11.1  
0.8  0.01 (0.37)  0.00 (0.24)  0.03 (0.69)  0.01 (0.29)  0.02 (0.20)  19.5  
SIMEX  
0.2  0.08 (0.35)  0.07 (0.25)  0.05 (0.59)  0.14 (0.30)  0.03 (0.28)  0.0  
0.6  0.08 (0.35)  0.07 (0.25)  0.08 (0.59)  0.13 (0.29)  0.13 (0.23)  0.0  
0.8  0.05 (0.35)  0.06 (0.25)  0.07 (0.59)  0.14 (0.30)  0.18 (0.19)  0.0  
NormalNormal  
t  0.2  0.22 (0.39)  0.09 (0.28)  0.07 (0.72)  0.16 (0.38)  0.06 (0.26)  1.6 
0.6  0.23 (0.41)  0.04 (0.28)  0.23 (0.80)  0.18 (0.38)  0.13 (0.22)  0.7  
0.8  0.20 (0.41)  0.03 (0.29)  0.21 (0.78)  0.22 (0.41)  0.17 (0.17)  0.8  
BinomialNormal  
0.2  0.02 (0.46)  0.01 (0.30)  0.74 (1.13)  0.34 (0.47)  0.01 (0.33)  3.6  
0.6  0.02 (0.48)  0.01 (0.30)  0.89 (1.20)  0.36 (0.48)  0.01 (0.24)  7.9  
0.8  0.00 (0.47)  0.00 (0.31)  0.82 (1.20)  0.40 (0.53)  0.03 (0.18)  14.0  
SIMEX  
0.2  0.05 (0.42)  0.05 (0.31)  0.64 (0.85)  0.52 (0.47)  0.05 (0.28)  0.0  
0.6  0.01 (0.43)  0.07 (0.31)  0.77 (0.90)  0.51 (0.47)  0.10 (0.22)  0.0  
0.8  0.00 (0.42)  0.05 (0.31)  0.66 (0.86)  0.52 (0.47)  0.15 (0.17)  0.0  
NormalNormal  
Skewnormal  0.2  0.64 (0.29)  0.17 (0.22)  0.54 (0.38)  0.13 (0.22)  0.03 (0.26)  1.4 
(low skewness)  0.6  0.57 (0.30)  0.00 (0.23)  0.49 (0.40)  0.13 (0.23)  0.11 (0.21)  1.8 
0.8  0.52 (0.31)  0.10 (0.23)  0.44 (0.43)  0.13 (0.22)  0.17 (0.17)  2.1  
BinomialNormal  
0.2  0.54 (0.32)  0.12 (0.23)  0.35 (0.48)  0.05 (0.26)  0.10 (0.33)  5.4  
0.6  0.49 (0.32)  0.05 (0.24)  0.28 (0.53)  0.03 (0.28)  0.03 (0.26)  10.6  
0.8  0.44 (0.34)  0.16 (0.24)  0.18 (0.59)  0.02 (0.29)  0.00 (0.19)  20.0  
SIMEX  
0.2  0.48 (0.32)  0.06 (0.24)  0.16 (0.48)  0.11 (0.28)  0.05 (0.27)  0.0  
0.6  0.42 (0.33)  0.11 (0.25)  0.10 (0.51)  0.13 (0.29)  0.09 (0.22)  0.0  
0.8  0.38 (0.33)  0.22 (0.25)  0.04 (0.54)  0.13 (0.30)  0.16 (0.18)  0.0  
NormalNormal  
Skewnormal  0.2  0.59 (0.30)  0.37 (0.20)  0.52 (0.40)  0.19 (0.18)  0.20 (0.23)  1.2 
(high skewness)  0.6  0.44 (0.31)  0.24 (0.22)  0.44 (0.43)  0.15 (0.21)  0.01 (0.18)  1.8 
0.8  0.32 (0.32)  0.15 (0.22)  0.43 (0.44)  0.13 (0.21)  0.11 (0.15)  1.2  
BinomialNormal  
0.2  0.49 (0.32)  0.35 (0.21)  0.30 (0.51)  0.13 (0.21)  0.30 (0.29)  7.6  
0.6  0.33 (0.35)  0.22 (0.23)  0.14 (0.62)  0.06 (0.26)  0.16 (0.20)  15.1  
0.8  0.19 (0.37)  0.13 (0.24)  0.08 (0.66)  0.04 (0.27)  0.05 (0.16)  25.6  
SIMEX  
0.2  0.43 (0.32)  0.31 (0.21)  0.13 (0.50)  0.02 (0.22)  0.22 (0.24)  0.0  
0.6  0.26 (0.34)  0.17 (0.23)  0.00 (0.55)  0.06 (0.26)  0.03 (0.19)  0.0  
0.8  0.14 (0.35)  0.08 (0.24)  0.02 (0.57)  0.09 (0.27)  0.10 (0.16)  0.0 
Simulation results for the low accuracy scenario
Randomeffects  ρ  \(\overline {\eta }\)  \(\overline {\xi }\)  \(\sigma ^{2}_{\eta }\)  \(\sigma ^{2}_{\xi }\)  ρ  Failure 

distribution  bias (s.e.)  bias (s.e.)  bias (s.e.)  bias (s.e.)  bias (s.e.)  rate %  
NormalNormal  
Normal  0.2  0.00 (0.32)  0.01 (0.22)  0.16 (0.51)  0.07 (0.22)  0.02 (0.27)  0.5 
0.6  0.00 (0.32)  0.01 (0.21)  0.19 (0.49)  0.09 (0.21)  0.07 (0.20)  0.0  
0.8  0.01 (0.33)  0.01 (0.21)  0.15 (0.51)  0.08 (0.21)  0.08 (0.14)  0.3  
BinomialNormal  
0.2  0.02 (0.33)  0.00 (0.22)  0.09 (0.54)  0.04 (0.23)  0.00 (0.30)  1.0  
0.6  0.00 (0.33)  0.00 (0.22)  0.11 (0.53)  0.05 (0.23)  0.01 (0.22)  1.4  
0.8  0.01 (0.34)  0.00 (0.22)  0.05 (0.56)  0.04 (0.23)  0.01 (0.14)  4.0  
SIMEX  
0.2  0.03 (0.34)  0.01 (0.22)  0.00 (0.54)  0.02 (0.24)  0.02 (0.28)  0.0  
0.6  0.01 (0.33)  0.01 (0.22)  0.02 (0.53)  0.01 (0.23)  0.06 (0.21)  0.0  
0.8  0.02 (0.34)  0.01 (0.22)  0.03 (0.55)  0.01 (0.23)  0.07 (0.14)  0.0  
NormalNormal  
t  0.2  0.04 (0.40)  0.01 (0.27)  0.44 (0.83)  0.21 (0.37)  0.02 (0.25)  0.5 
0.6  0.01 (0.39)  0.03 (0.27)  0.35 (0.78)  0.20 (0.36)  0.09 (0.20)  0.2  
0.8  0.01 (0.39)  0.02 (0.27)  0.39 (0.80)  0.21 (0.36)  0.08 (0.14)  0.2  
BinomialNormal  
0.2  0.01 (0.44)  0.01 (0.28)  0.80 (1.01)  0.32 (0.42)  0.00 (0.28)  1.3  
0.6  0.01 (0.42)  0.01 (0.28)  0.70 (0.95)  0.32 (0.42)  0.03 (0.21)  1.2  
0.8  0.01 (0.43)  0.00 (0.29)  0.81 (1.01)  0.36 (0.43)  0.01 (0.13)  2.7  
SIMEX  
0.2  0.01 (0.44)  0.03 (0.29)  0.92 (0.96)  0.43 (0.42)  0.02 (0.26)  0.0  
0.6  0.02 (0.43)  0.00 (0.29)  0.81 (0.91)  0.42 (0.42)  0.08 (0.20)  0.0  
0.8  0.00 (0.43)  0.01 (0.29)  0.87 (0.94)  0.47 (0.44)  0.07 (0.13)  0.0  
NormalNormal  
Skewnormal  0.2  0.55 (0.27)  0.13 (0.21)  0.45 (0.37)  0.09 (0.21)  0.09 (0.26)  1.0 
(low skewness)  0.6  0.44 (0.29)  0.02 (0.21)  0.38 (0.40)  0.09 (0.21)  0.05 (0.20)  0.4 
0.8  0.41 (0.29)  0.11 (0.21)  0.34 (0.42)  0.10 (0.21)  0.09 (0.14)  0.2  
BinomialNormal  
0.2  0.55 (0.28)  0.12 (0.21)  0.41 (0.38)  0.07 (0.22)  0.12 (0.28)  1.8  
0.6  0.45 (0.30)  0.03 (0.22)  0.32 (0.42)  0.06 (0.23)  0.02 (0.21)  1.8  
0.8  0.42 (0.31)  0.12 (0.22)  0.27 (0.45)  0.06 (0.23)  0.01 (0.15)  5.9  
SIMEX  
0.2  0.55 (0.28)  0.11 (0.22)  0.35 (0.38)  0.01 (0.22)  0.09 (0.26)  0.0  
0.6  0.45 (0.30)  0.04 (0.22)  0.26 (0.42)  0.01 (0.23)  0.04 (0.20)  0.0  
0.8  0.42 (0.31)  0.14 (0.22)  0.21 (0.45)  0.01 (0.23)  0.08 (0.14)  0.0  
NormalNormal  
Skewnormal  0.2  0.51 (0.28)  0.34 (0.19)  0.41 (0.39)  0.16 (0.17)  0.26 (0.22)  1.0 
(high skewness)  0.6  0.32 (0.31)  0.23 (0.20)  0.27 (0.45)  0.11 (0.20)  0.09 (0.15)  0.1 
0.8  0.20 (0.31)  0.15 (0.21)  0.21 (0.48)  0.09 (0.21)  0.01 (0.11)  0.2  
BinomialNormal  
0.2  0.51 (0.29)  0.33 (0.19)  0.36 (0.41)  0.15 (0.18)  0.31 (0.20)  2.9  
0.6  0.32 (0.31)  0.23 (0.21)  0.21 (0.48)  0.08 (0.21)  0.17 (0.16)  4.1  
0.8  0.20 (0.33)  0.15 (0.22)  0.11 (0.53)  0.05 (0.23)  0.07 (0.10)  8.5  
SIMEX  
0.2  0.51 (0.29)  0.32 (0.20)  0.30 (0.40)  0.10 (0.18)  0.27 (0.22)  0.0  
0.6  0.32 (0.32)  0.22 (0.21)  0.13 (0.48)  0.03 (0.21)  0.11 (0.15)  0.0  
0.8  0.20 (0.33)  0.14 (0.22)  0.05 (0.52)  0.01 (0.22)  0.00 (0.11)  0.0 
Results for the high accuracy scenario (Table 1) show that, for metaanalysis with small sample size and under the randomeffects normal specification, the BinomialNormal approach appears to be preferable in terms of bias of the estimators with respect to alternative solutions, although at the price of a sligthly larger standard error. Such a behaviour, however, deteriorates when moving to t and skewed distributions, with the bias tending to increase as the value of the correlation ρ becomes smaller. Under a t randomeffects distribution, the BinomialNormal approach and SIMEX show an increased bias of the estimators of the variance components \(\sigma ^{2}_{\eta }\) and \(\sigma ^{2}_{\xi }\), together with an increased standard error. The effects for the NormalNormal approach are less marked. When considering a skewnormal distribution for (η _{ i },ξ _{ i })^{⊤} with a high value of skewness, SIMEX appears to be the preferable solution in terms of bias, in particular when referring to the estimates of the variance components.
Substantial differences between the competing methods occur in terms of failure rate of the estimation process, see the last column of Table 1. Convergence problems affect the likelihood approach, under the BinomialNormal formulation in particular, in this way confirming previous findings in the literature [9, 16, 19]. The failure rate is notable when n=10, increases with ρ and deteriorates under a skewnormal randomeffects specification with high values of skewness, thus making the use of the likelihood solution questionable. For the high skewness case and ρ=0.8, for example, the BinomialNormal approach reaches 25.6% of failures. More extreme experiments with ρ=0.9, not reported here, substantiate the results, with a further growth of nonconvergence rate higher than 31%. Conversely from the likelihood approach, the application of SIMEX does not fail in any of the examined situations, irrespectively of the sample size n, the correlation ρ and the randomeffects distribution, with a nonconvergence rate constantly equal to zero.
When moving to the low accuracy scenario, results are similar to those observed for the high accuracy case. Consider, for example, results reported in Table 2, where bias and estimated standard error of the estimators are only slightly reduced with respect to Table 1. The most interesting result is the reduction of the failure rate with respect to the high accuracy scenario. With regards to the likelihood analysis, the most substantial reduction of nonconvergences is observed for the BinomialNormal formulation, whose failure rate does not exceed the 5% level under the normal or t randomeffects formulation and reaches 8.5% under the skewnormal distribution. Similarly to the high accuracy context, the failure rate tends to increase with ρ. SIMEX maintains a failure rate equal to zero.
Results for the medium accuracy scenario are reported in Additional file 1. Conclusions are coherent wth those from the low and high accuracy scenario. From a computational point of view, nonconvergence problems mainly affect the BinomialNormal approach, with failure rates reaching 22.3% under the skewnormal specification when the sample size is small.
Results for n=25 are reported in Additional file 1. Inferential conclusions about the bias of the estimators under all the accuracy scenarios remain globally similar to those for n=10, with the advantage of a reduced estimate of the standard error of the estimators. The most interesting result related to the increased sample size is the reduction of the failure rate, under all the examined situations. The NormalNormal approach is almost convergent in all the simulation settings. The BinomialNormal approach substantially reduces the number of failures, with just two cases exceeding the 5% threshold, corresponding to the skewnormal case with high skewness and ρ=0.8 in the high accuracy and medium accuracy scenarios. SIMEX maintains a failure rate equal to zero.
Data example
Transesophageal echocardiography data [38]
Study  TP  FP  TN  FN 

1  3  0  72  25 
2  3  0  66  19 
3  4  0  56  10 
4  0  0  8  6 
5  4  1  66  10 
6  5  1  49  11 
Data analysis
Method  Sensitivity  Specificity 

NormalNormal model  21 (13, 32)  99 (96, 99) 
BinomialNormal model  –  – 
SIMEX approach  17.9 (10.9, 27.8)  98.6 (97.4, 99.3) 
Nonparametric model (Zapf et al. [15])  19.0 (11.9, 28.9)  99.4 (97.9, 99.8) 
Discussion
Results from the simulation studies indicate that SIMEX leads to satisfactory inferential results in a wide range of scenarios. When the normality assumption for the randomeffects distribution holds, the method is comparable to the likelihood solutions in terms of bias and estimated standard error of the estimators of the parameters of interest and slightly superior to the BinomialNormal formulation in terms of empirical coverages of confidence intervals for different diagnostic accuracy measures. When departures from the normality assumptions hold in terms of low or high skewness, then advantages of using SIMEX are much more evident. In particular, empirical coverages of confidence intervals for the diagnostic accuracy measures are closer to the 95% target level than alternatives. The likelihood approach under the BinomialNormal formulation shows a less satisfactory performance.
A substantial difference between SIMEX and the likelihood approach is in terms of failure rate of convergence. SIMEX has not convergence problems whichever the examined scenario. Conversely, likelihood solutions suffer for convergence difficulties, especially in case of skewed randomeffects distribution. The highest levels of failure rate are reached using the BinomialNormal formulation and they are much more frequent as the sample size is small and the value of the correlation ρ increases. Simulation results are in accordance with previous findings in the literature about convergence issues and numerical instabilities of the likelihood approach. Possible solutions evaluated in the simulation studies, including the change of the optimisation algorithm and the change of the starting values [16, 19], only slightly reduce the number of failures. When adopting the SIMEX strategy, several solutions are available in case of convergence failure, although we did not experience such a problem in our study. Possible solutions include the choice of a different estimation method within each bth replication of the simulation step or varying the number of simulated datasets B. An additional practical strategy is the visual inspection of the SIMEX components and the direct extrapolation of the points of interest. This strategy is suggested in Section B.4.1 of Carroll et al. [28] when the SIMEX estimated variance/covariance matrix is nonpositive definite. Although possible, this is an infrequent case and we did not encounter it in our simulations.
From a strictly practical point of view, the implementation of SIMEX, despite its simulationbased nature, is not involved neither timeconsuming and can proceed by taking advantage of simple estimation methods, such as the method of moments. The R [37] code for SIMEX implementation is made available in the Additional file 2 and illustrated in the Additional file 3.
Although the scenarios investigated in the paper do not consider the presence of additional level covariates in model (1), SIMEX can be extended to account for them. In this case, the number of remeasured datasets B is recommended to be increased in order to guarantee the results having an acceptable precision, see Section 5.3 in Carroll et al. [28].
In this paper, the model structure used for the simulation step in SIMEX is given by the approximate model (2) in place of the exact model (3). The choice implies that, when necessary, the correction that adds 0.5 to the twobytwo table cells equal to zero is applied. Additional empirical investigations with different correction values show that the 0.5 correction does not impact the results. Such a behaviour is related to the SIMEX procedure, as the correction can affect only the original data, while the remeasured data of the simulation step are not influenced. Simulating the discrete components of the twobytwo table in place of their continuous logit transformations \(\hat {\eta }_{i}\) and \(\hat {\xi }_{i}\) is theoretically possible. In this case, the measurement error problem affects the classification of the positive/negative results in the twobytwo table, thus being called misclassification problem, see Küchenhoff et al. [39]. However, moving from the new generated discrete data to the logit transformations would still be an obligatory step, as the data are necessary for inference in the main model (1). In this case, the 0.5 correction would still apply, not only on the original data but in every case the problem arises within the simulation step. How to circumvent these limitations when simulating from the exact model (3) represents a topic of future research.
Conclusions
This paper focused on bivariate randomeffects models for metaanalysis of diagnostic test accuracy. Attention is paid to the presence of errors affecting the measures of diagnostic accuracy. Standard likelihoodbased procedures are shown to be prone to several drawbacks, despite their wide diffusion. The inaccuracy of inferential conclusions for small sample size and in case of misspecification of the randomeffects distribution is accompanied by computational issues which seriously affect the applicability of the approach. The SIMEX methodology represents an interesting and promising alternative. Reliable inference properly accounting for the presence of measurement errors is obtained with neither computational effort not numerical instabilities. The satisfactory performance of SIMEX illustrated through extensive simulation experiments is not affected by study characteristics, such as sample size or measurement error correlation. Robustness to departures from normal randomeffects distributions is a substantial improvement over standard likelihood solutions. The availability of the R code for a userfriendly implementation of SIMEX is aimed at encouraging its use.
Abbreviations
 dOR:

Diagnostic odds ratio
 FN:

False negatives
 FP:

False positives
 LR:

Negative likelihood ratio
 LR+:

Positive likelihood ratio
 s.e.:

Standard error
 SE:

Sensitivity
 SIMEX:

Simulation extrapolation
 SP:

Specificity
 SROC:

Summary operating receiver characteristic
 TN:

True negatives
 TP:

True positives
Declarations
Acknowledgments
Not applicable.
Funding
This work was supported by a grant from the University of Padova (Progetti di Ricerca di Ateneo 2015, CPDA153257).
Availability of data and materials
The real data used for the illustration of SIMEX and likelihood procedures are reported in Table 3. The R code to implement SIMEX is available in the Additional file 2 and the commands needed to apply the software for data analysis are illustrated in the Additional file 3.
Competing interests
The author declares that she has no competing interests.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Honest H, Khan K. Reporting of measures of accuracy in systematic reviews of diagnostic literature. BMC Health Serv Res. 2002; 2:4:266–7.View ArticleGoogle Scholar
 Riley RD, Abrams KR, Sutton AJ, Lambert PC, Thompson JR. Bivariate randomeffects metaanalysis and the estimation of betweenstudy correlation. BMC Med Res Methodol. 2007; 7:3:266–7.View ArticleGoogle Scholar
 Riley RD, Thompson JR, Abrams KR. An alternative model for bivariate randomeffects metaanalysis when the withinstudy correlations are unknown. Biostatistics. 2008; 9:172–86.View ArticlePubMedGoogle Scholar
 Hamza TH, van Houwelingen HC, Stijnen T. The binomial distribution of metaanalysis was preferred to model withinstudy variability. J Clin Epidemiol. 2008; 61:41–51.View ArticlePubMedGoogle Scholar
 Littenberg B, Moses LE. Estimating diagnosticaccuracy from multiple conflicting reports: a new metaanalytic method. Med Dec Making. 1993; 13:313–21.View ArticleGoogle Scholar
 Moses LE, Shapiro D, Littenberg B. Combining independent studies of a diagnostic test into a summary ROC curve: dataanalytic approaches and some additional considerations. Stat Med. 1993; 12:1293–316.View ArticlePubMedGoogle Scholar
 Reitsma JB, Glas AS, Rutjes AWS, Scholten RJ, Bossuyt PM, Zwinderman AH. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol. 2005; 58:982–90.View ArticlePubMedGoogle Scholar
 Arends LR, Hamza TH, van Houwelingen HC, HeijenbrokKal MH, Hunink MGM, Stijnen T. Bivariate random effects metaanalysis of ROC curves. Med Dec Making. 2008; 28:621–38.View ArticleGoogle Scholar
 Chen Y, Liu Y, Ning J, Nie L, Zhu H, Chu H. A composite likelihood method for bivariate metaanalysis in diagnostic systematic reviews. Stat Methods Med Res. 2014. http://journals.sagepub.com/doi/full/10.1177/0962280214562146.
 Chu H, Nie L, Chen Y, Huang Y, Sun W. Bivariate random effects models for metaanalysis of comparative studies with binary outcomes: methods for the absolute risk difference and relative risk. Stat Methods Med Res. 2012; 21:621–33.View ArticlePubMedGoogle Scholar
 Kuss O, Hoyer A, Solms A. Metaanalysis for diagnostic accuracy studies: a new statistical model using betabinomial distributions and bivariate copulas. Stat Med. 2014; 33:17–30.View ArticlePubMedGoogle Scholar
 Nikoloulopoulos AK. A mixed effect model for bivariate metaanalysis of diagnostic test accuracy studies using a copula representation of the random effects distribution. Stat Med. 2015; 34:3842–65.View ArticlePubMedGoogle Scholar
 Eusebi P, Reitsma JB, Vermunt JK. Latent class bivariate model for the metaanalysis of diagnostic test accuracy studies. BMC Med Res Methodol. 2014; 14:88.
 Schlattmann P, Verba M, Dewey M, Walther M. Mixture models in diagnostic metaanalyses – clustering summary receiver operating characteristic curves accounted for heterogeneity and correlation. J Clin Epidemiol. 2015; 68:61–72.View ArticlePubMedGoogle Scholar
 Zapf A, Hoyer A, Kramer K, Kuss O. Nonparametric metaanalysis for diagnostic accuracy studies. Stat Med. 2015; 34:3831–41.View ArticlePubMedGoogle Scholar
 Hamza TH, Reitsma JB, Stijnen T. Metaanalysis of diagnostic studies: A comparison of random intercept, normalnormal, and binomialnormal bivariate summary ROC approaches. Med Dec Making. 2008; 28:639–49.View ArticleGoogle Scholar
 Paul M, Riebler A, Bachmann LM, Rue H, Held L. Bayesian bivariate metaanalysis of diagnostic test studies using integrated nested Laplace approximations. Stat Med. 2010; 29:1325–9.View ArticlePubMedGoogle Scholar
 Diaz M. Performance measures of the bivariate random effects model for metaanalyses of diagnostic accuracy. Comput Stat Data Anal. 2015; 83:82–90.View ArticleGoogle Scholar
 Takwoingi Y, Guo B, Riley RD, Deeks JJ. Performance of methods for metaanalysis of diagnostic test accuracy with few studies or sparse data. Stat Methods Med Res. 2015. http://journals.sagepub.com/doi/abs/10.1177/0962280215592269?url_ver=Z39.882003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub%3dpubmed.
 Cook JR, Stefanski LA. Simulation extrapolation estimation in parametric measurement error models. J Am Stat Assoc. 1994; 89:1314–28.View ArticleGoogle Scholar
 Stefanski LA, Cook JR. Simulationextrapolation: the measurement error jackknife. J Am Stat Assoc. 1995; 90:1247–56.View ArticleGoogle Scholar
 Rutter CM, Gatsonis CA. A hierarchical regression approach to metaanalysis of diagnostic test accuracy evaluations. Stat Med. 2001; 20:2865–84.View ArticlePubMedGoogle Scholar
 Harbord RM, Deeks JJ, Egger M, Whiting P, Sterne JA. A unification of models for metaanalysis of diagnostic accuracy studies. Biostatistics. 2007; 8:239–51.View ArticlePubMedGoogle Scholar
 Chu H, Cole SR. Bivariate metaanalysis of sensitivity and specificity with sparse data: a generalized linear mixed model approach. J Clin Epidemiol. 2006; 59:1331–3.View ArticlePubMedGoogle Scholar
 Van Houwelingen HC, Arends LR, Stijnen T. Advanced methods in metaanalysis: multivariate approach and metaregression. Stat Med. 2002; 21:589–624.View ArticlePubMedGoogle Scholar
 McIntosh MW. The population risk as an explanatory variable in research synthesis of clinical trials. Stat Med. 1996; 15:1713–28.View ArticlePubMedGoogle Scholar
 Schmid CH, Lau J, McIntosh MW, Cappelleri JC. An empirical study of the effect of the control rate as a predictor of treatment efficacy in metaanalysis of clinical trials. Stat Med. 1998; 17:1923–42.View ArticlePubMedGoogle Scholar
 Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu C. Measurement Error in Nonlinear Models: A Modern Perspective. Boca Raton: Chapman & Hall, CRC Press; 2006.View ArticleGoogle Scholar
 Buonaccorsi JP. Measurement Error: Models, Methods and Applications. Boca Raton: Chapman & Hall, CRC Press; 2010.View ArticleGoogle Scholar
 Arends LR, Hoes AW, Lubsen J, Grobbee DE, Stijnen T. Baseline risk as predictor of treatment benefit: three clinical metareanalyses. Stat Med. 2000; 19:3497–518.View ArticlePubMedGoogle Scholar
 Ghidey W, Stijnen T, van Houwelingen HC. Modelling the effect of baseline risk in metaanalysis: A review from the perspective of errorsinvariables regression. Stat Methods Med Res. 2013; 22:307–23.View ArticlePubMedGoogle Scholar
 Guolo A. The SIMEX approach to measurement error correction in metaanalysis with baseline risk as covariate. Stat Med. 2014; 33:2062–76.View ArticlePubMedGoogle Scholar
 Carroll RJ, Küchenhoff H, Lombard F, Stefanski LA. Asymptotics for the SIMEX estimator in nonlinear measurement error models. J Am Stat Assoc. 1996; 91:242–50.View ArticleGoogle Scholar
 Holcomb J. Regression with covariates and outcome calculated from a common set of variables measured with error: Estimation using the SIMEX method. Stat Med. 1999; 18:2847–62.View ArticlePubMedGoogle Scholar
 Azzalini A. A class of distributions which includes the normal ones. Scand J Stat. 1985; 12:171–8.Google Scholar
 Nelder JA, Mead R. A simplex algorithm for function minimization. Scand J Stat. 1965; 7:308–13.Google Scholar
 R Core Team: R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2015. http://www.Rproject.org/.
 Van Zaane B, Zuithoff NPA, Reitsma JB, Bax L, Nierich AP, Moons KG. Metaanalysis of the diagnostic accuracy of transesophageal echocardiography for assessment of atherosclerosis in the ascending aorta in patients undergoing cardiac surgery. Acta Anaesthesiol Scand. 2008; 52:1179–87.View ArticlePubMedGoogle Scholar
 Küchenhoff H, Mwalili SM, Lesaffre E. A general method for dealing with misclassification in regression: The misclassification SIMEX. Biometrics. 2006; 62:85–96.View ArticlePubMedGoogle Scholar