# Optimizing the diagnostic power with gastric emptying scintigraphy at multiple time points

- Qingjiang Hou
^{1}, - Zhiyue Lin
^{2}, - Reginald Dusing
^{3}, - Byron J Gajewski
^{1, 4}, - Richard W McCallum
^{5}and - Matthew S Mayo
^{1}Email author

**11**:84

**DOI: **10.1186/1471-2288-11-84

© Hou et al; licensee BioMed Central Ltd. 2011

**Received: **30 November 2010

**Accepted: **31 May 2011

**Published: **31 May 2011

## Abstract

### Background

Gastric Emptying Scintigraphy (GES) at intervals over 4 hours after a standardized radio-labeled meal is commonly regarded as the gold standard for diagnosing gastroparesis. The objectives of this study were: 1) to investigate the best time point and the best combination of multiple time points for diagnosing gastroparesis with repeated GES measures, and 2) to contrast and cross-validate Fisher's Linear Discriminant Analysis (LDA), a rank based Distribution Free (DF) approach, and the Classification And Regression Tree (CART) model.

### Methods

A total of 320 patients with GES measures at 1, 2, 3, and 4 hour (h) after a standard meal using a standardized method were retrospectively collected. Area under the Receiver Operating Characteristic (ROC) curve and the rate of false classification through jackknife cross-validation were used for model comparison.

### Results

Due to strong correlation and an abnormality in data distribution, no substantial improvement in diagnostic power was found with the best linear combination by LDA approach even with data transformation. With DF method, the linear combination of 4-h and 3-h increased the Area Under the Curve (AUC) and decreased the number of false classifications (0.87; 15.0%) over individual time points (0.83, 0.82; 15.6%, 25.3%, for 4-h and 3-h, respectively) at a higher sensitivity level (sensitivity = 0.9). The CART model using 4 hourly GES measurements along with patient's age was the most accurate diagnostic tool (AUC = 0.88, false classification = 13.8%). Patients having a 4-h gastric retention value >10% were 5 times more likely to have gastroparesis (179/207 = 86.5%) than those with ≤10% (18/113 = 15.9%).

### Conclusions

With a mixed group of patients either referred with suspected gastroparesis or investigated for other reasons, the CART model is more robust than the LDA and DF approaches, capable of accommodating covariate effects and can be generalized for cross institutional applications, but could be unstable if sample size is limited.

## Background

Gastric emptying scintigraphy at intervals over 4 h after a standardized meal is commonly regarded as the gold standard for measuring gastric emptying times. In practice, a simplified hourly measure of residual gastric contents is used for diagnosing gastroparesis defined as delayed Gastric Emptying (GE) in the absence of mechanical obstruction [1, 2]. The technique measures radio-labeled food remaining in the stomach at hourly intervals after patient ingests a standardized meal [3] as an indicator of delayed GE. Because of differences in food used, techniques employed, and endpoints measured with GES, analysis and interpretation of the results vary among institutions [4]. Most previous studies have shown that delayed gastric emptying can be best indicated with gastric retention of >10% at 4- h after meal, which was established as the 95 percentile in gastric retention with normal volunteers through a multicenter study [5, 6]. As always, a distribution-based diagnostic decision is arbitrary and may inevitably induce error because of misclassification. It is reported that the simplified approach has a specificity of 62% and a sensitivity of 93% [7]. Others use the percent retention at 2-h as the routine clinical measurements of GES [8], suggesting GES at 2-h or 3-h might be the best individual time point with higher diagnostic power. However, percent gastric retentions at different time points may also be subject to patient age and gender [5]. The lack of standardization in conducting GES along with differences in quantitative analysis used at various institutions may limit the clinical utility of the test, and presents problems if cross institutional data need to be evaluated. In 2008, on behalf of the American Neurogastroenterology and Motility Society and the Society of Nuclear Medicine, researchers from 13 US medical institutions jointly issued a consensus statement for standardization of GES across institutions [4]. In addition, issues requiring further investigation were identified by the consensus which include: 1) use of 3-h compared to 2- and 4-h results for detection of delayed GE; and 2) use of multiple time points (2- and 4-h) versus single 2- or 4-h values for further understanding of the clinical meaning of discordant results between 2- and 4-h scans.

Methodology on using different diagnostic markers to detect diseases or assess health related risks has been an active field of research [9–11]. With the rapid advances in genomic and proteomic technologies, the focus on biomarker-based disease detection and risk assessment has now shifted from a single biomarker to a panel of biomarkers since different markers may be sensitive to different aspects of a disease [12]. It is argued that, compared with a single clinical or genetic marker, a panel of multiple markers may contain a higher level of discriminatory information, particularly across large heterogeneous patient populations and for complex multistage diseases. This is important because using multiple markers simultaneously might lead to a new diagnostic measure with higher sensitivity [11].

When multiple markers are continuous and normally distributed, Fisher's LDA provides the best linear combination that maximizes the sensitivity over the entire specificity range uniformly under the multivariate normal distribution model [11]. When marker values deviate from normal, a Box-Cox power transformation can be used to improve the normal fit [13]. This approach has been found effective in estimating the AUC and ROC curves if the underlying true distributions, either on the original or on a transformed scale, are close to multivariate normal [11]. Otherwise, a rank based distribution free approach can be applied. The theoretical aspects of the best linear combination for biomarkers are well reported [[9, 11, 14–19], and [20]]. Fisher's LDA is easier to compute, while the DF approach is more robust to deviation from normal distribution, but is computationally prohibitive if more than two markers are involved [19]. We used the parameter estimates from LDA as a starting point for grid search with DF if marker space goes beyond bilateral. The CART model, on the other hand, provides another approach to optimize the diagnostic power when markers are multi-dimensional [21–23]. Rather than creating a new diagnostic measure like LDA or DF, CART uses a sequential process to identify the set of predictor variables, in their original scale, that best differentiate groups among the outcome variable of interest. It is particularly useful when there are many predictor variables that are highly correlated. It is computationally less intensive and easy to interpret, but can be unstable if the model is inadequately trained with limited data.

Most previous studies focus on deriving the best combination of markers that maximizes AUC for subjects whose disease status was known [11, 14–20]. It is unclear if the diagnostic power for such a combination extends to the population with known marker values but unknown disease status or to the population whose marker values are not used for deriving the diagnostic parameters. With a standardized meal (see Materials and Methods) and the hourly GES technique (five 2-minutes imaging sessions) over 4 hour period, less camera time is required while results can be reproducible from hospital to hospital. This paper evaluates the statistical options that optimize the diagnostic power with GES measures at a few time points. Using the primary clinical diagnosis, defined by symptoms such as nausea, vomiting, early satiety, postprandial fullness, abdominal discomfort, and pain, in addition to evidence of delayed gastric emptying in the absence of mechanical gastric outlet obstruction, as the true status of gastroparesis, this study focuses on finding the parameters of the best linear combination of GES at different time points with training data, then, investigates and cross-validates its performance in test data that was not used for deriving the model.

## Methods

The Receiver Operating characteristic Curve (ROC) and the area under the curve in particular is a simple and meaningful measure to assess the usefulness of a diagnostic marker(s) [10]. Throughout this paper, ROC and AUC are used to compare across different methods and various models for the best diagnostic power of gastroparesis. The sum of false positive and false negative diagnoses over the entire sample was used to contrast the diagnostic power between LDA, DF and CART through jackknife (take one out) cross validation.

### Fisher's Linear Discriminant Approach (LDA)

Let *w*
_{
ij
}represents the *i*
^{
th
}marker value of the *j*
^{
th
}subject in the diseased group; and *v*
_{
ik
}be the *i*
^{
th
}marker value of the *k*
^{
th
}subject in the control group; where *i* = 1, ..., p; *j* = 1, ..., m; and *k* = 1, ..., n.

**X**and

**Y**are vectors of marker values with multivariate normal distribution for the diseased and control groups respectively, then

*α*

_{p × 1}, for the best linear combination of the combined marker vector from the diseased and control groups under the ROC criteria is:

Where, Φ denotes the standard cumulative normal distribution function.

### Rank based Distribution Free approach (DF)

Note, for continuous distribution, Pr (*I* = 1/2) = 0.

Where, *w*
_{
ij
}is the *i*
^{th} marker value for the *j*
^{th} subject in the diseased group; *v*
_{
ik
}is the *i*
^{th} marker value for the *k*
^{th} subject in the control group; *i, j, k, m*, and *n* as defined in the previous section; *AUC*(*α*)is area under the curve by distribution free approach with optimal coefficient vector (α). Since ROC is invariant to monotone increasing transformation, the coefficient vector *α* in both LDA and DF approaches can be rescaled as (1, β_{2/} β_{1}, β_{3/} β_{1}, ... β_{i/} β_{1}), where β_{i/} β_{1} represents the weight for the *i*
^{th} marker value relative to marker 1 [19].

### Classification and Regression Trees (CART) model

Unlike LDA or DF, CART methodology identifies the set of predictors from all variables that best differentiate classes in the outcome of interest in a sequential process. At each step (node), CART algorithm first identifies the best predictor from all candidate variables and then searches through all values for the best predictor but uses only one cutoff point to divide the sample within the node into two sub-branches. It then searches through all predictor variables and identifies the best cutoff point from the best predictor within each sub-branch and repeats the process until a certain criteria, such as a minimum variance or a minimum group size, in all terminal nodes is met. At each node, CART seeks to classify the sample into groups such that maximum homogeneity of the child nodes is reached. When a terminal node is reached, the model gives the probability of belonging to a particular category for all remaining subjects that fall into that node. In practice, the minimum node size is usually set at 10% of the learning sample to avoid potentially over fitting the model such that the final decision tree is more likely to be useful for classifying populations with similar characteristics as that of the learning sample but was not used for deriving the CART model. In contrast to LDA approach, CART can be especially useful when the correlation patterns among predictor variables are not consistent over the entire range, because it is not necessary for the same variable to be optimal for all branch nodes of the entire tree [22].

### Parameter estimation

For LDA approach, we wrote a SAS/IML program [25] for calculating the mean (m), the variance (S) for diseased and control groups, and all parameter estimates are described in the method section. First, the coefficient vector *α*, AUC for the linear combination as well as for individual markers, was obtained with equations (2) and (3) from the training data. Then, the coefficient *α* was applied to the corresponding test data to obtain the linear combination score. Three threshold values corresponding to sensitivity levels at 0.7, 0.8, and 0.9 for the linear combination score were obtained with the gastroparesis data using
in equation (5). Each threshold value was used on the left out data to classify the case into either diseased or non-diseased group. The predicted status was then cross tabulated with the known disease category.

The DF approach started with an arbitrary starting point and then grid searched for the coefficients that maximize the Mann-Whitney U statistic with the training data in the following steps.

1) A linear combination score for each observation was obtained by multiplying the marker vector **X** with a starting coefficient vector, *α*
_{(1,α).}

2) The AUC corresponding to the coefficient vector *α*is calculated with equation (6).

3) Repeat the first two steps for every possible coefficient vector and aggregate the corresponding AUC and *α*.

4) Select the coefficient that maximized the AUC and identify the critical linear combination score value at 0.7, 0.8, and 0.9 sensitivity levels.

5) Applying the coefficient from step 4 to the test data to obtain the best linear combination score.

6) Use each critical value from step 4 in the test data to predict each case into either diseased or non-diseased group, then cross tabulate with the known disease category.

For CART model, we used the TREE package within R [26] environment for each of four hourly measures, and their combination along with patient age and gender to identify the best model. Cross-validation with a minimum size of 10 subjects for each terminal node was used to optimize the decision tree model using all observations [27]. AUC for each optimized decision tree was calculated to compare across all models. Then, one observation was taken out from the entire sample, and the remaining subjects used to build the decision tree, which in turn was used to predict the disease status of the left out observation. Performance for each decision tree was summarized with the total number of wrong predictions across the entire sample.

### Gastroparesis data

A total of 320 charts from patients aged 16 ~ 89 (42.8 ± 14.3 (mean ± std)), 255 (79%) female, with GES measures at 1 h, 2 h, 3 h and 4 h after a standard meal using a standardized method (5) were retrospectively collected at The University of Kansas Medical Center (KUMC). The study protocol was approved by the Institutional Review Board (IRB) at KUMC. During GES measurement, the fraction of meal consumed and the time taken for the consumption was recorded. Subjects with unusual percent meal consumed (e.g. <20% of the meal)/consumption time (e.g. >30 minutes) were excluded. All patients were either referred with suspected gastroparesis or investigated for other reasons because of self reported symptoms such as nausea, fullness, early satiety, vomiting, and bloating. Based on overall evaluation, in addition to hourly GES measures, the study physicians diagnosed 197 (62%) of the 320 patients with gastroparesis as the primary reason for above mentioned clinical symptoms and their hospital visits. Despite similar medical experiences, diseases other than gastroparesis were considered as the primary diagnoses for the remaining 123 patients. No significant difference in mean age (p = 0.12, by t-test) and gender (p = 0.99, by χ^{2} test) were found between groups with and without gastroparesis. For each patient, gastric emptying scintigraphy was performed in the morning after an overnight fast with prokinetics stopped for at least 3 days. The standardized method for gastric emptying consists of the equivalent of two scrambled eggs (egg substitute ) labeled with ^{99m} Tc sulphur-colloid, 2 pieces of toast with jelly, and 4 oz of water with a total caloric value of 255 kcal. Anterior and posterior images of the stomach were taken immediately after eating, and then hourly for 4 hours [28].

## Results

### Diagnostic powers by LDA and DF approach

Parameter estimates (Area Under the Curve (AUC) and coefficient for best linear combination) with Fisher's Linear Discriminant Analysis (LDA) and Distribution Free (DF) approaches

4-h | 3-h | Linear combination DF | Linear combination LDA | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Raw | Tran | Raw | Tran | α | Raw | α | Tran | α | Raw | α | Tran | |

Mean | 0.830 | 0.879 | 0.824 | 0.827 | 0.02 | 0.869 | 0.02 | 0.869 | 0.681 | 0.834 | 0.055 | 0.881 |

STD | 0.002 | 0.001 | 0.002 | 0.002 | 0.00 | 0.001 | 0.00 | 0.001 | 0.032 | 0.002 | 0.002 | 0.001 |

Median | 0.830 | 0.879 | 0.824 | 0.827 | 0.02 | 0.869 | 0.02 | 0.869 | 0.679 | 0.833 | 0.055 | 0.880 |

Range | 0.013 | 0.009 | 0.011 | 0.009 | 0.00 | 0.008 | 0.00 | 0.008 | 0.284 | 0.012 | 0.017 | 0.009 |

False classifications (%) with jackknife cross-validation by Fisher's Linear Discriminant Analysis (LDA) and Distribution Free (DF) approaches

Criteria | 3-h DF | 4-h DF | Linear combination DF | Linear combination LDA | ||||
---|---|---|---|---|---|---|---|---|

0.7 | 24 (19.5%) | 59 (30.0%) | 16 (13.0%) | 59 (30.0%) | 15 (12.2%) | 58 (29.4%) | 15 (12.2%) | 64 (32.3%) |

0.8 | 37 (30.1%) | 39 (19.8%) | 21 (17.1%) | 40 (20.3%) | 21 (17.1%) | 40 (20.3%) | 20 (16.3%) | 46 (23.4%) |

0.9 | 60 (48.8%) | 21 (10.7%) | 27 (21.9%) | 23 (11.7%) | 28 (22.8%) | 20 (10.2%) | 27 (22.0%) | 22 (11.2%) |

With the raw data, the differences in sum of false positive and false negative rates for the best linear combinations were 0.2%, 1.6%, and -6.5% between the DF and LDA approaches at the 0.7, 0.8, and 0.9 sensitivity levels. While the false negative rates by the LDA approach showed large deviation from that corresponding to the preset threshold levels (≤3.0%), those by the DF approach are relatively close (≤± 0.2%).

Data transformation improved the diagnostic power of the best linear combinations for both the LDA and DF approaches by decreasing the sum of false negative and false positive rates. The reductions are 2.2%, 2.8%, 13.8%, and -0.9%, -1.1%, 20.1% for the DF and LDA approaches, respectively. Interestingly, the largest improvement in diagnostic power is at the higher level of sensitivity for both approaches.

### Diagnostic powers with CART approach

False classifications (%) with jackknife cross-validation by optimized Classification And Regression Tree (CART) models

Type of Misdiagnosis | 1-h | 2-h | 3-h | 4-h | 2-h + 4-h | 3-h + 4-h | 4-h + 3-h + 2-h + 1-h + Age |
---|---|---|---|---|---|---|---|

76 (61.8%) | 28 (22.8%) | 29 (23.6%) | 28 (22.8%) | 28 (22.8%) | 28 (22.8%) | 26 (21.1%) | |

24 (12.2%) | 69 (35.0%) | 44 (22.3%) | 18 (9.1%) | 18 (9.1%) | 18 (9.1%) | 18 (9.1%) | |

Total Misdiagnosis | 100 (31.3%) | 97 (30.3%) | 73 (22.8%) | 46 (14.4%) | 46 (14.4%) | 46 (14.4%) | 44 (13.8%) |

AUC For Optimized Model | 0.724 | 0.753 | 0.825 | 0.867 | 0.865 | 0.858 | 0.881 |

## Discussion

Linear combinations of diagnostic markers obtained by LDA or DF approach usually lead to higher discriminate powers (larger AUC) than with its individual components. A simulation study (results not show here) indicated that the stronger the correlation among individual markers, the smaller the increase in AUC by their linear combination. The potential gain in diagnostic power, however, diminishes when the correlation between individual markers increase up to 0.7 and above. The effectiveness of using a linear combination decreases with the increase in magnitude of disproportion in covariance matrices between the diseased and control groups.

Small sample sizes lead to large variation in optimal coefficients for best linear combination and the corresponding AUC, especially for the LDA approach. As sample size increases, optimal coefficients and AUC by LDA or DF methods may or may not approach each other depending on the distribution of individual markers.

### Marker values normal

When the marker values are multivariate normal, the estimates by the LDA approach are very close to that of the DF method in terms of AUC, optimal coefficients, and the diagnostic power as indicated by simulation. With normal distribution and adequate sample size, the first two moments capture marker's location and scale parameter with small variation. In such cases, the LDA approach has the advantage in saving computation time (more than 100 fold less) without suffering in predictive power than the DF approach. Nevertheless, LDA cannot outperform DF as long as the searching grid for optimal coefficient with DF contains the point estimate by LDA. In other words, the limitation with DF is in computation, rather than methodology.

### Marker values not normal

When marker values deviate from normal distribution, the DF approach always leads to higher AUC for the best linear combination if the searching grid for optimal coefficients is fine enough. The downgrading performance with LDA approach is a direct result of using the means that is biased due to abnormality. Exponential distribution, for example, tends to have a long tail with a high degree of skewness, leading to a mean with a positive bias. As a consequence, the variances for markers are inflated and the AUC tends to be smaller. More importantly, the best linear combination obtained with LDA approach tends to overestimate the false negative rate and underestimate the false positive rate at the lower sensitivity level (Table 2), and do exactly the opposite at the higher sensitivity level. On the contrary, the best linear combination by DF approach is less affected by extreme values and tends to have higher diagnostic power while maintaining the preset sensitivity levels. This is important because, in practice, a fixed false negative rate represents a critical limit of tolerance in diagnostic medicine. Beyond such limit, the stake of loss-benefit ratio would increase, or, at least, the diagnostic decision is less cost effective.

### The effect of data transformation

Effective data transformation improves the normal fit and thus parameter estimation by LDA approach, but whether this improvement will hold in new data that is not included for parameter estimation remains unclear. Cross-validation with the gastroparesis data indicated that power transformation increased AUC and stabilized parameter estimates in the training sets, and, that such gains would translate into higher diagnostic power in the test sets (Table 2). Data transformation closed the gap in diagnostic power between the best linear combinations by LDA and DF approaches with the clinical data. Interestingly, the DF approach showed a consistent improvement with transformation across all levels of sensitivity at 0.7, 0.8, and 0.9. The trend with the LDA approach is not as clear. One reason might be the percent gastric retentions measured at 3-h and 4-h are so skewed that power transformation is not enough to put the measurements on nearly normal distribution.

### Optimizing the diagnostic power by GES measures with CART model

In practice, the critical value of 0.5 in probability can be used. That is, a patient is diagnosed with gastroparesis if, based on the post meal percent gastric retention measures along with covariates, the probability is greater than or equal to 0.5.

## Conclusions

By contrasting CART, LDA, and DF approaches with the gastroparesis data, we hoped to be able to answer the two questions raised in the 2008 consensus statement [4]. In summary, we found that: 1) The diagnostic power of 4-h is higher compared to 2- and 3 -h results for detection of delayed GE, regardless of the method employed and whether the data is in original scale or in transformation; 2) Use of multiple time points (2-, 3-, and 4-h) will increase AUC and, as a result, the diagnostic power, versus single 2-, 3-, or 4-h values (Figure 2 & 3). The improvement, however, is limited, especially with the LDA approach, because hourly measures are skewed and highly correlated. Therefore, we conclude that among LDA, DF, and CART, the CART model is more appealing because it is robust to outliers and invariant to data distribution, easy to compute, capable of using any number of predictor variables, and it offers easy interpretation. A potential downside with the CART model, however, lies in its limitation when the number of subjects (n) is small.

## Declarations

### Acknowledgements

The authors would like to acknowledge the following individuals for their contributions: Irene Sarosiek, Katherine Roeser, Pernilla Foran, Teri Lavenbarg, and Irene King in the Center for Gastrointestinal Nerve and Muscle Function and the Division of GI Motility, as well as faculty, fellows and nursing staff in the Division of Gastroenterology, as well as Christine McMillin and other support personnel in the Department of Nuclear Medicine, all from Kansas University Medical Center. Dr. McCallum's research was partially supported by NIH Grant No. 1N01DK074035-01 entitled *"*New Diagnosis and Treatment Approaches for Patients with gastroparesis".

## Authors’ Affiliations

## References

- Maurer AH, Parkman HP: Update on gastrointestinal scintigraphy. Semin Nucl Med. 2006, 36: 110-118. 10.1053/j.semnuclmed.2005.12.003.View ArticlePubMedGoogle Scholar
- Camilleri M, Hasler WL, Parkman HP, Quigley EMM, Soffer E: Measurement of gastroduodenal motility in the GI laboratory. Gastroenterology. 1998, 115: 747-762. 10.1016/S0016-5085(98)70155-6.View ArticlePubMedGoogle Scholar
- Griffith GA, Owen GM, Kirkman S, Campbell H, Shields R: Measurement of rate of gastric emptying using chrominum-51. Lancet. 1968, 1: 1244-1245.Google Scholar
- Abell TL, Camilleri M, Donohoe K, Hasler WL, Lin HC, Maurer AH, McCallum RW, Nowak T, Nusynowitz ML, Parkman HP, Shreve P, Szarka L, Snape WJ, Ziessman HA: Consensus recommendations for gastric emptying scintigraphy: a joint report of the American neurogasreoenterology and motility society and the society of nuclear medicine. Am J Gastroenterology. 2008, 103: 753-10.1111/j.1572-0241.2007.01636.x. 763View ArticleGoogle Scholar
- Tougas G, Eaker EY, Abell TL, Abrahamsson H, Boivin M, Chen J, Hocking MP, Quigley EM, Koch KL, Tokayer AZ, Stanghellini V, Chen Y, Huizinga JD, Rydén J, Bourgeois I, McCallum RW: Assessment of gastric emptying using a low-fat meal: establishment of international control values. Am J Gastroenterology. 2000, 95: 1456-1462. 10.1111/j.1572-0241.2000.02076.x.View ArticleGoogle Scholar
- Lin Z, Sarosiak I, McCallum RW: Optimal detection of gastroparesis. Am J Gastroenterology. 2006, 101: S135-Google Scholar
- Camilleri M: Diabetic gastroparesis. New England Journal of Medicine. 2007, 356: 820-829. 10.1056/NEJMcp062614.View ArticlePubMedGoogle Scholar
- Guo JP, Maurer AH, Fisher RS, Parkman HP: Extending gastric emptying scintigraphy from two to four hours detects more patients with gastroparesis. Dig Dis Sci. 2001, 46: 24-29. 10.1023/A:1005697422454.View ArticlePubMedGoogle Scholar
- Anderson TW, Bahadur RR: Classification into two multivariate normal distributions with different covariance matrices. Ann Math Statist. 1962, 33: 420-431. 10.1214/aoms/1177704568.View ArticleGoogle Scholar
- Hanley JA: Receiver operating characteristic (ROC) methodology: The state of the art. Critical Review in Diagnostic Imaging. 1989, 29 (3): 307-335.Google Scholar
- Su JQ, Liu JS: Linear combinations of multiple diagnostic markers. Journal of the American Statistical Association. 1993, 88: 1350-1355. 10.2307/2291276.View ArticleGoogle Scholar
- Srivastava S, Gopal-Srivastava R: Biomarkers in cancer screening: A public health perspective. Journal of Nutrition. 2002, 132: 2471-2475.Google Scholar
- Box GEP, Cox DR: An analysis of transformations. Journal of the Royal Statistical Society, Series B. 1964, 26: 211-243.Google Scholar
- Liu AY, Schisterman EF, Zhu Y: On linear combinations of markers to improve diagnostic accuracy. Statistics in Medicine. 2005, 24: 37-47. 10.1002/sim.1922.View ArticlePubMedGoogle Scholar
- Baker SG: Identifying combination of cancer markers for further study as triggers of early intervention. Biometrics. 2000, 56: 1082-1087. 10.1111/j.0006-341X.2000.01082.x.View ArticlePubMedGoogle Scholar
- Reiser B, Faraggi D: Confidence intervals for the generalized ROC criterion. Biometrics. 1997, 53: 644-652. 10.2307/2533964.View ArticlePubMedGoogle Scholar
- Pepe MS: A regression modeling framework for receiver operating characteristic curves in medical diagnostic testing. Biometrika. 1997, 84 (3): 595-608. 10.1093/biomet/84.3.595.View ArticleGoogle Scholar
- Pepe MS: Receiver operating characteristic methodology. Journal of the American Statistical Association. 2000, 95 (449): 308-311. 10.2307/2669554.View ArticleGoogle Scholar
- Pepe MS, Thompson ML: Combining diagnostic test results to increase accuracy. Biostatistics. 2000, 1 (2): 123-140. 10.1093/biostatistics/1.2.123.View ArticlePubMedGoogle Scholar
- Zheng Y, Cai TX, Feng ZD: Application of the time-dependent ROC curves for prognostic accuracy with multiple biomarkers. Biometrics. 2006, 62: 279-287. 10.1111/j.1541-0420.2005.00441.x.View ArticlePubMedGoogle Scholar
- Fernandez-Matrid F, VandeVord PJ, Yang X, Karvonen RL, Simpson PM, Kraut MJ, Granda JL, Tomkiel JE: Antinuclear antibodies as potential markers of lung cancer. Clinical Cancer Research. 1999, 5: 1393-1400.Google Scholar
- Rosenfeld B, Lewis C: Assessing violence risk in stalking cases: a regression tree approach. Law and Human Behavior. 2005, 29 (3):
- Pittman J, Huang E, Nevins J, Wang Q, West M: Bayesian analysis of binary prediction tree models for retrospectively sampled outcomes. Biometrics. 2004, 5 (4): 587-601Google Scholar
- Hanley JA, Mcneil BJ: The meaning and use of the area under a receiver operating haracteristic (ROC) curve. Radiology. 1982, 143: 29-32.View ArticlePubMedGoogle Scholar
- SAS Institute, Inc: SAS/IML User's Guide, Version 8. 1999, Cary, NC: SAS Institute, IncGoogle Scholar
- R version 2.8.1 (2008-12-22). Copy right©: The R Foundation for Statistical Computing. 2008Google Scholar
- Andreas LP, Stolfo SJ: Cost complexity-based pruning of ensemble classifiers. Knowledge and Information Systems. 2001, 3: 449-469. 10.1007/PL00011678.View ArticleGoogle Scholar
- Lin Z, Hou Q, Sarosiek J, Forster J, Mccallum RW: Association between changes in symptoms and gastric emptying in gastroparetic patients treated with gastric electrical stimulation. Neurogastroenterol and Motil. 2008, 20: 464-470. 10.1111/j.1365-2982.2007.01054.x.View ArticleGoogle Scholar
- Hasler WL, Parkman HP, Wilson LA, Pasricha PJ, Koch KL, Abell TL, Snape WJ, Farrugia G, Lee L, Tonascia J, et al: Psychological Dysfunction Is Associated With Symptom Severity but Not Disease Etiology or Degree of Gastric Retention in Patients With Gastroparesis. The American Journal of Gastroenterology. 2010Google Scholar
- Garcia EV, Taylor A, Halkar R, Folks R, Krishnan M, Cooke CD, Dubovsky E: RENEX: An expert system for the interpretation of
^{99m}Tc-MAG3 scans to detect renal obstruction. J Nucl Med. 2006, 47: 320-329.PubMedGoogle Scholar - The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/11/84/prepub

### Pre-publication history

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.