Skip to main content
  • Research article
  • Open access
  • Published:

A methodological framework to distinguish spectrum effects from spectrum biases and to assess diagnostic and screening test accuracy for patient populations: Application to the Papanicolaou cervical cancer smear test

Abstract

Background

A spectrum effect was defined as differences in the sensitivity or specificity of a diagnostic test according to the patient's characteristics or disease features. A spectrum effect can lead to a spectrum bias when subgroup variations in sensitivity or specificity also affect the likelihood ratios and thus post-test probabilities. We propose and illustrate a methodological framework to distinguish spectrum effects from spectrum biases.

Methods

Data were collected for 1781 women having had a cervical smear test and colposcopy followed by biopsy if abnormalities were detected (the reference standard). Logistic models were constructed to evaluate both the sensitivity and specificity, and the likelihood ratios, of the test and to identify factors independently affecting the test's characteristics.

Results

For both tests, human papillomavirus test, study setting and age affected sensitivity or specificity of the smear test (spectrum effect), but only human papillomavirus test and study setting modified the likelihood ratios (spectrum bias) for clinical reading, whereas only human papillomavirus test and age modified the likelihood ratios (spectrum bias) for "optimized" interpretation.

Conclusion

Fitting sensitivity, specificity and likelihood ratios simultaneously allows the identification of covariates that independently affect diagnostic or screening test results and distinguishes spectrum effect from spectrum bias. We recommend this approach for the development of new tests, and for reporting test accuracy for different patient populations.

Peer Review reports

Background

"Spectrum bias" in diagnostic test evaluation was first reported by Ransohoff and Feinstein in 1978 [1]. They observed that the sensitivity and specificity of diagnostic tests could differ between subgroups of patients with different characteristics, including severity and location of the disease or clinical features. Since this pioneering study, many authors have described such differences in performance for numerous tests in various contexts (e.g. [214]). It has been recommended that authors report estimates of variability of diagnostic accuracy between subgroups of patients affected by these differences in performance and this was recently endorsed by the STARD Initiative [15, 16]. However, other authors have expressed scepticism regarding the evaluation of accuracy of diagnostic or screening tests, to the point of considering them "unpredictable" as their accuracy may depend on too many factors [17, 18], and the use the post test probabilities (PTP) as indicators of test accuracy has been proposed [13].

As the literature became increasingly confused, the recent paper by Goehring et al. [19] represented an important breakthrough by drawing attention to the need for distinguishing between various "spectrum effects". Having defined "spectrum effect" as differences in the sensitivity or specificity of a diagnostic or screening test according to the patient's characteristics or to the features and severity of the disease, Goehring et al. showed that a "spectrum effect" can lead to a spectrum bias when subgroup variations in sensitivity or specificity also affect the likelihood ratios and thus post-test probabilities (see also [9, 11, 20]). Indeed, there are some situations for which subgroup analyses of sensitivity and specificity do not lead to the same conclusions as subgroup analyses for likelihood ratios. For example, conflicting results can be obtained when there is no variation in sensitivity and specificity between subgroups, but a higher prevalence of the disease in one subgroup than another. Conversely, variations in sensitivity and specificity do not mechanically imply biased results if one considers the "overall" test characteristics [19]. As sensitivity and specificity are inversely related, differences between subgroups do not necessarily affect likelihood ratios (and therefore post-test probabilities). Unfortunately, the term "bias" in "spectrum bias" may be misleading, as "bias" usually refers to the lack of validity of results due to inadequate study design (e.g. using a diagnostic case-control design to select patients rather than a diagnostic cohort design) and inadequate spectrum selection (e.g. by assessing an inappropriate group of patients given the study objective) [14, 21]. Nevertheless we will conform to the work of Goehring et al. [19] and use this term herein because of its other classical meaning, which is statistical, regarding the use of an estimator giving wrong estimations: indeed, the post-test probabilities of diseases would be biased (and thus the clinical decision altered) if the appropriate group-specific performance values of the test are not used.

Goehring et al. [19] only proposed stratified analysis of spectrum effects and biases. The recent logistic regression approach by Janssens et al. [22] is complementary to that developed more than twenty years ago by Hlatky [2] and subsequently by Coughlin [23] and Moons [9] (among others) and extends this analysis to multivariable cases. Such multivariable analyses are necessary because factors responsible for differences in performance of tests are generally numerous and closely related.

Here, we propose a methodological framework, derived from the approaches described (both applied together for the first time), to distinguish spectrum effects from spectrum biases. Our purpose is to isolate factors independently affecting the diagnostic accuracy of a test. This approach is illustrated by an application to the Papanicolaou smear test for detection of cervical cancer.

Methods

Data sources

We undertook a secondary analysis of the study by the French Society of Clinical Cytology to compare the efficiency of the conventional Papanicolaou smear, ThinPrep liquid-based cytology and the Hybrid-Capture II human papillomavirus test (HPV test) [2426]. The design of the study was described in detail elsewhere [24]. This analysis focuses on one of the three tests, the conventional Papanicolaou smear test, and the spectrum variations associated with it. All women included in this study (n = 1781) were evaluated by the reference standard (colposcopy followed by biopsy if abnormalities were detected), by the index test (conventional Papanicolaou smear test) and by the HPV test (which was considered in this analysis as a "spectrum" variable). These women were either referred for colposcopy because abnormalities had been detected on previous smears (referral clinic setting, n = 461) or were attending for routine smears (screening setting, n = 1320). Conventional Papanicolaou smear tests were read twice: in addition to routine reading in normal conditions ("clinical reading"), a reading blind to the context and clinical history was obtained for Papanicolaou test smears separately and independently by two different pathologists. In cases of disagreement, the slides were read again to reach a consensus conclusion, with a decision given, if necessary, by an independent expert ("optimized diagnosis"). Smear test results were classified as negative (normal smear or atypical squamous cells/glandular cells of undetermined significance (ASCUS/AGUS)) or positive (low grade or high grade squamous intraepithelial lesions or invasive cancer) according to the 1991 Bethesda system [27]. The reference standard results were classified as negative (normal colposcopy or negative biopsy result) or positive (cervical intraepithelial neoplasia of grade I, II or III or invasive carcinoma) according to the International Federation of Cervical Pathology and Colposcopy classification system [28]. The validity of these cutoff points may be open to discussion, but they were used in our previous papers and classify a sufficient number of patients with significant lesions. Other characteristics of the women were also recorded: age, current smoking, European or other origin, educational level, menopausal status and contraception status.

Statistical analysis

Sensitivity, specificity and likelihood ratios were used as indicators of test accuracy. Stratified analyses of these indicators were performed for the following variables: HPV test, study setting (referral clinic or screening), age (< or ≥ 35 years), current smoking, European origin, educational level (higher education or less), menopausal status and contraception status (none, combined oral pill or other). Confidence intervals for sensitivity and specificity were produced with the Wilson score method without continuity correction [29]. Confidence intervals for positive and negative likelihood ratios were calculated by the method described by Simel et al. [30]. Logistic regression models were also constructed for sensitivity and specificity and the likelihood ratios to evaluate spectrum effects and spectrum biases associated with these variables.

The logistic model for sensitivity and specificity proposed by Coughlin et al. [23] was used to estimate sensitivity and specificity by defining the dependent variable as the dichotomous result of the diagnostic test. The presence of the disease defined by the reference standard is included as a binary explanatory variable, as are covariates potentially affecting sensitivity or specificity (Additional file 1). Interaction terms between the reference standard and covariates were also included to test whether the covariates affect sensitivity and specificity differentially.

The approach proposed by Janssens et al. [22] was used to estimate the likelihood ratios of a diagnostic test results conditional to covariates. It requires the construction of logistic models for the "prior odds" of the disease and one for "posterior odds" of the disease. The prior odds regression model includes only the covariate(s). The posterior odds regression model also includes the binary result of the diagnostic test and interaction terms between the diagnostic test and covariate(s), which indicate if the covariates affect the positive and the negative likelihood ratios differentially (Additional file 1). The likelihood ratios for the result of the diagnostic test conditional on the values of the covariates were further obtained by subtracting the coefficients of the prior odds regression model from the coefficients of the posterior odds regression model [22]. Confidence intervals for the differences in logistic regression coefficients were approximated by a bootstrap technique with 2000 random bootstrap samples with replacement [31].

All multivariable regression models initially included covariates modifying the test accuracy indicators with a p-value of 0.20 or less in logistic regression univariable analyses and first-order interactions between these covariates and the disease status (according to the diagnostic test or the reference standard). Final models were obtained by a grouped backward stepwise selection procedure. At each step, the variable with the least significant main effect was removed from the model if its interaction terms were associated with a p-value greater than 0.05. Even if not significant, all first-order interactions (for variables with a significant main effect) were conserved in the final model to obtain less biased estimations of group-specific likelihood ratios, as recommended by Janssens et al. [22].

All analyses were performed using SAS software version 8 [32].

Results

Among the 1781 women included, 355 scored positive with the conventional Papanicolaou smear test (20%). Table 1 presents the characteristics of the 1781 women included and the results of stratified analysis of sensitivity, specificity and likelihood ratios. The smear test's accuracy differed substantially between subgroups, in particular for HPV test and study setting, both for clinical and optimized readings.

Table 1 Subgroup analysis of the sensitivity, specificity and likelihood ratios of the Papanicolaou smear test

Table 2 provides a summary of univariable and multivariable results for sensitivity, specificity and likelihood ratios. For the sake of simplicity, this table reports only effects with p-values of less than 0.2. Several covariates modified sensitivity or specificity but few affected the likelihood ratio(s). The multivariable modelling allowed the number of covariates affecting diagnostic accuracy to be decreased by removing non-independent factors (current smoking, European origin or educational level) that were related to sensitivity and specificity or likelihood ratios through HPV test, age or study setting. For both clinical and optimized readings, HPV test, study setting and age affected specificity and sensitivity independently. For the clinical reading, HPV test and study setting were both responsible for a spectrum bias whereas age had no effect on likelihood ratios. For the optimized interpretation, the HPV test and age were the only two factors responsible for a spectrum bias.

Table 2 Summary of univariable and multivariable regression analyses for sensitivity/specificity and likelihood ratios of the Papanicolaou smear test (only p-values less than 0.20 are presented)

Additional files 2 and 3 contains details about the sensitivity, specificity (Additional file 2) and likelihood ratios (Additional file 3) of the final models for clinical reading.

Discussion

We propose a methodological framework for identifying factors independently responsible for spectrum effects (i.e. which affect the sensitivity and specificity only) and for spectrum biases (i.e. which affect the likelihood ratios and post-test probabilities). This framework consists of double modelling, of sensitivity/specificity and positive/negative likelihood ratios respectively and therefore extends the stratified analysis of spectrum effects and biases proposed by Goehring et al. [19], taking into account the fact that these factors are generally numerous and closely related. We demonstrated the usefulness of this framework by application to Papanicolaou smear testing for the detection of cervical cancer. With this approach, we were able to differentiate the covariates linked to disease prevalence or severity and true "test modifiers" (modifying the test results due to their own effect, as HPV and age should do) from others factors affecting test accuracy only through "test modifiers" (for example current smoking, European origin and educational level). The massive and consistent effect of the HPV test result on Papanicolaou smear test results can be explained by the influence of the virus on cellular features. Disease prevalence and/or severity have a well-known effect on test accuracy indices [33]. Indeed, high risk (or oncogenic) HPV is the cause of cervical cancer development and is currently considered as a marker of severity of intraepithelial lesions [34, 35]. The study setting was found to be responsible for spectrum bias only for clinical reading, confirming the information bias (or clinical review bias) observed for reading not blind to the context and clinical history. The strong effect of study setting probably masked the effect of age on the clinical reading, as age appeared to be responsible for spectrum bias only in optimized reading (where information bias was neutralized).

Many authors report differences in diagnostic or screening test accuracy between subgroups, but few have used a multivariable modelling approach to identify factors responsible for differences in the performance of tests and confounding factors. A review of current practice, including investigations of so-called spectrum bias (Table 3), shows that a large number of factors have been investigated, often without discernment but frequently with confusion regarding their significance to test accuracy. Moreover, most of these studies analyzed test accuracy only in terms of sensitivity and specificity [68, 3640], making it impossible to distinguish between spectrum effect and spectrum bias.

Table 3 Studies that investigated subgroup variations*

Our framework nevertheless presents some difficulties, mainly due to having to use non-trivial regression modelling. In particular, the simultaneous fitting of prior and posterior odds of the disease could be considered complex, as could the use of bootstrapping methods to construct confidence intervals for coefficients. Another difficulty is the management of interaction terms and the risk of colinearity between covariates included in the models. We chose to include only first order interactions between covariates and the disease status (according to the diagnostic test or the reference standard) because these interactions were the only ones relevant in the context of the differences in performance of a diagnostic or screening test. Usual recommendations concerning the practical implementation of regression analysis methods remain helpful in this context [41, 42]. In particular, attention must be paid to the lack of power of the interaction test and its interpretation: the logistic model for sensitivity and specificity includes diseased and non-diseased patients and gives results closer to the sensitivity when the proportion of non-diseased patients is high, as is the case here. For example, we observe "paradoxical" results for current smoking, which is a significant predictor of sensitivity and specificity in the univariable analysis for clinical reading (Table 2, the interaction term is not significant), but with confidence intervals inconsistent with this conclusion. However, the use of a multivariable approach does not negate recommendations about patient selection or eliminate the necessity for carefully defined and relevant inclusion criteria – a spectrum of patients needs to be included that is similar to the population in which the test will be used in practice [15, 16, 43].

Conclusion

In conclusion, we have shown the value of complementary and simultaneous modelling of sensitivity, specificity and likelihood ratios in logistic regression models: this approach can identify covariates that independently affect the accuracy of a diagnostic or screening test and can distinguish spectrum bias from spectrum effects. This approach appears preferable to subgroup analyses, which are classically recommended [15, 16] but for which the problems are well known [44]: the number of patients per group is often small, especially if the number of covariates is high, leading to analyses that are not very powerful or accurate and problems of interpretation. As in therapeutic research [4547], approaches based on regression modelling (and interaction testing) should replace subgroup analysis for the development of diagnostic and screening tests and for reporting their accuracy.

References

  1. Ransohoff DF, Feinstein AR: Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. N Engl J Med. 1978, 299: 926-930.

    Article  CAS  PubMed  Google Scholar 

  2. Hlatky MA, Pryor DB, Harrell FE, Califf RM, Mark DB, Rosati RA: Factors affecting sensitivity and specificity of exercise electrocardiography. Multivariable analysis. Am J Med. 1984, 77: 64-71. 10.1016/0002-9343(84)90437-6.

    Article  CAS  PubMed  Google Scholar 

  3. Lachs MS, Nachamkin I, Edelstein PH, Goldman J, Feinstein AR, Schwartz JS: Spectrum bias in the evaluation of diagnostic tests: lessons from the rapid dipstick test for urinary tract infection. Ann Intern Med. 1992, 117: 135-140.

    Article  CAS  PubMed  Google Scholar 

  4. Miettinen OS, Caro JJ: Foundations of medical diagnosis: what actually are the parameters involved in Bayes' theorem?. Stat Med. 1994, 13: 201-9; discussion 211-5. 10.1002/sim.4780130302.

    Article  CAS  PubMed  Google Scholar 

  5. van der Schouw YT, Van Dijk R, Verbeek AL: Problems in selecting the adequate patient population from existing data files for assessment studies of new diagnostic tests. J Clin Epidemiol. 1995, 48: 417-422. 10.1016/0895-4356(94)00144-F.

    Article  CAS  PubMed  Google Scholar 

  6. O'Connor PW, Tansay CM, Detsky AS, Mushlin AI, Kucharczyk W: The effect of spectrum bias on the utility of magnetic resonance imaging and evoked potentials in the diagnosis of suspected multiple sclerosis. Neurology. 1996, 47: 140-144.

    Article  PubMed  Google Scholar 

  7. Curtin F, Morabia A, Pichard C, Slosman DO: Body mass index compared to dual-energy x-ray absorptiometry: evidence for a spectrum bias. J Clin Epidemiol. 1997, 50: 837-843. 10.1016/S0895-4356(97)00063-2.

    Article  CAS  PubMed  Google Scholar 

  8. Roger VL, Pellikka PA, Bell MR, Chow CW, Bailey KR, Seward JB: Sex and test verification bias. Impact on the diagnostic value of exercise echocardiography. Circulation. 1997, 95: 405-410.

    Article  CAS  PubMed  Google Scholar 

  9. Moons KG, van Es GA, Deckers JW, Habbema JD, Grobbee DE: Limitations of sensitivity, specificity, likelihood ratio, and bayes' theorem in assessing diagnostic probabilities: a clinical example. Epidemiology. 1997, 8: 12-17. 10.1097/00001648-199701000-00002.

    Article  CAS  PubMed  Google Scholar 

  10. Steinbauer JR, Cantor SB, Holzer CE, Volk RJ: Ethnic and sex bias in primary care screening tests for alcohol use disorders. Ann Intern Med. 1998, 129: 353-362.

    Article  CAS  PubMed  Google Scholar 

  11. Moons KG, van Es GA, Michel BC, Buller HR, Habbema JD, Grobbee DE: Redundancy of single diagnostic test evaluation. Epidemiology. 1999, 10: 276-281. 10.1097/00001648-199905000-00015.

    Article  CAS  PubMed  Google Scholar 

  12. Moons KG, Grobbee DE: Diagnostic studies as multivariable, prediction research. J Epidemiol Community Health. 2002, 56: 337-338. 10.1136/jech.56.5.337.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Moons KG, Harrell FE: Sensitivity and specificity should be de-emphasized in diagnostic accuracy studies. Acad Radiol. 2003, 10: 670-672. 10.1016/S1076-6332(03)80087-9.

    Article  PubMed  Google Scholar 

  14. Whiting P, Rutjes AW, Reitsma JB, Glas AS, Bossuyt PM, Kleijnen J: Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med. 2004, 140: 189-202.

    Article  PubMed  Google Scholar 

  15. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Moher D, Rennie D, de Vet HC, Lijmer JG: The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Ann Intern Med. 2003, 138: W1-12.

    Article  PubMed  Google Scholar 

  16. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Lijmer JG, Moher D, Rennie D, de Vet HC: Towards complete and accurate reporting of studies of diagnostic accuracy: The STARD Initiative. Ann Intern Med. 2003, 138: 40-44.

    Article  PubMed  Google Scholar 

  17. Guggenmoos-Holzmann I, van Houwelingen HC: The (in)validity of sensitivity and specificity. Stat Med. 2000, 19: 1783-1792. 10.1002/1097-0258(20000715)19:13<1783::AID-SIM497>3.0.CO;2-B.

    Article  CAS  PubMed  Google Scholar 

  18. Cipriani D, Fox C, Khuder S, Boudreau N: Comparing Rasch analyses probability estimates to sensitivity, specificity and likelihood ratios when examining the utility of medical diagnostic tests. J Appl Meas. 2005, 6: 180-201.

    PubMed  Google Scholar 

  19. Goehring C, Perrier A, Morabia A: Spectrum bias: a quantitative and graphical analysis of the variability of medical diagnostic test performance. Stat Med. 2004, 23: 125-135. 10.1002/sim.1591.

    Article  PubMed  Google Scholar 

  20. Diamond GA, Rozanski A, Forrester JS, Morris D, Pollock BH, Staniloff HM, Berman DS, Swan HJ: A model for assessing the sensitivity and specificity of tests subject to selection bias. Application to exercise radionuclide ventriculography for diagnosis of coronary artery disease. J Chronic Dis. 1986, 39: 343-355. 10.1016/0021-9681(86)90119-0.

    Article  CAS  PubMed  Google Scholar 

  21. Mulherin SA, Miller WC: Spectrum bias or spectrum effect? Subgroup variation in diagnostic test evaluation. Ann Intern Med. 2002, 137: 598-602.

    Article  PubMed  Google Scholar 

  22. Janssens AC, Deng Y, Borsboom GJ, Eijkemans MJ, Habbema JD, Steyerberg EW: A new logistic regression approach for the evaluation of diagnostic test results. Med Decis Making. 2005, 25: 168-177. 10.1177/0272989X05275154.

    Article  PubMed  Google Scholar 

  23. Coughlin SS, Trock B, Criqui MH, Pickle LW, Browner D, Tefft MC: The logistic modeling of sensitivity, specificity, and predictive value of a diagnostic test. J Clin Epidemiol. 1992, 45: 1-7. 10.1016/0895-4356(92)90180-U.

    Article  CAS  PubMed  Google Scholar 

  24. Cochand-Priollet B, Le Gales C, de Cremoux P, Molinie V, Sastre-Garau X, Vacher-Lavenu MC, Vielh P, Coste J: Cost-effectiveness of monolayers and human papillomavirus testing compared to that of conventional Papanicolaou smears for cervical cancer screening: protocol of the study of the French Society of Clinical Cytology. Diagn Cytopathol. 2001, 24: 412-420. 10.1002/dc.1091.

    Article  CAS  PubMed  Google Scholar 

  25. Coste J, Cochand-Priollet B, de Cremoux P, Le Gales C, Cartier I, Molinie V, Labbe S, Vacher-Lavenu MC, Vielh P: Cross sectional study of conventional cervical smear, monolayer cytology, and human papillomavirus DNA testing for cervical cancer screening. Bmj. 2003, 326: 733-10.1136/bmj.326.7392.733.

    Article  PubMed  PubMed Central  Google Scholar 

  26. de Cremoux P, Coste J, Sastre-Garau X, Thioux M, Bouillac C, Labbe S, Cartier I, Ziol M, Dosda A, Le Gales C, Molinie V, Vacher-Lavenu MC, Cochand-Priollet B, Vielh P, Magdelenat H: Efficiency of the hybrid capture 2 HPV DNA test in cervical cancer screening. A study by the French Society of Clinical Cytology. Am J Clin Pathol. 2003, 120: 492-499. 10.1309/XFUC-PP6M-5XUA-94B8.

    Article  PubMed  Google Scholar 

  27. The Bethesda System for reporting cervical/vaginal cytologic diagnoses: revised after the second National Cancer Institute Workshop, April 29-30, 1991. Acta Cytol. 1993, 37: 115-124.

  28. Stafl A, Wilbanks GD: An international terminology of colposcopy: report of the Nomenclature Committee of the International Federation of Cervical Pathology and Colposcopy. Obstet Gynecol. 1991, 77: 313-314.

    Article  CAS  PubMed  Google Scholar 

  29. Newcombe RG: Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Med. 1998, 17: 857-872. 10.1002/(SICI)1097-0258(19980430)17:8<857::AID-SIM777>3.0.CO;2-E.

    Article  CAS  PubMed  Google Scholar 

  30. Simel DL, Samsa GP, Matchar DB: Likelihood ratios with confidence: sample size estimation for diagnostic test studies. J Clin Epidemiol. 1991, 44: 763-770. 10.1016/0895-4356(91)90128-V.

    Article  CAS  PubMed  Google Scholar 

  31. Efron B, Tibshirani R: An introduction to the Bootstrap. Monographs on Statistics and Applied Probability. 1993, New York, Chapman & Hall

    Google Scholar 

  32. SAS [computer program]. Version 8. Cary, NC: SAS Institute, Inc; 1999.

  33. Brenner H, Gefeller O: Variation of sensitivity, specificity, likelihood ratios and predictive values with disease prevalence. Stat Med. 1997, 16: 981-991. 10.1002/(SICI)1097-0258(19970515)16:9<981::AID-SIM510>3.0.CO;2-N.

    Article  CAS  PubMed  Google Scholar 

  34. Bosch FX, de Sanjose S: Chapter 1: Human papillomavirus and cervical cancer--burden and assessment of causality. J Natl Cancer Inst Monogr. 2003, 3-13.

    Google Scholar 

  35. Schiffman M, Kjaer SK: Chapter 2: Natural history of anogenital human papillomavirus infection and neoplasia. J Natl Cancer Inst Monogr. 2003, 14-19.

    Google Scholar 

  36. Morise AP, Diamond GA: Comparison of the sensitivity and specificity of exercise electrocardiography in biased and unbiased populations of men and women. Am Heart J. 1995, 130: 741-747. 10.1016/0002-8703(95)90072-1.

    Article  CAS  PubMed  Google Scholar 

  37. Egglin TK, Feinstein AR: Context bias. A problem in diagnostic radiology. Jama. 1996, 276: 1752-1755. 10.1001/jama.276.21.1752.

    Article  CAS  PubMed  Google Scholar 

  38. Santana-Boado C, Candell-Riera J, Castell-Conesa J, Aguade-Bruix S, Garcia-Burillo A, Canela T, Gonzalez JM, Cortadellas J, Ortega D, Soler-Soler J: Diagnostic accuracy of technetium-99m-MIBI myocardial SPECT in women and men. J Nucl Med. 1998, 39: 751-755.

    CAS  PubMed  Google Scholar 

  39. Dimatteo LA, Lowenstein SR, Brimhall B, Reiquam W, Gonzales R: The relationship between the clinical features of pharyngitis and the sensitivity of a rapid antigen test: evidence of spectrum bias. Ann Emerg Med. 2001, 38: 648-652. 10.1067/mem.2001.119850.

    Article  CAS  PubMed  Google Scholar 

  40. Hall MC, Kieke B, Gonzales R, Belongia EA: Spectrum bias of a rapid antigen detection test for group A beta-hemolytic streptococcal pharyngitis in a pediatric population. Pediatrics. 2004, 114: 182-186. 10.1542/peds.114.1.182.

    Article  PubMed  Google Scholar 

  41. Harrell FE, Lee KL, Califf RM, Pryor DB, Rosati RA: Regression modelling strategies for improved prognostic prediction. Stat Med. 1984, 3: 143-152. 10.1002/sim.4780030207.

    Article  PubMed  Google Scholar 

  42. Harrell FE, Lee KL, Mark DB: Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996, 15: 361-387. 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4.

    Article  PubMed  Google Scholar 

  43. Reid MC, Lachs MS, Feinstein AR: Use of methodological standards in diagnostic test research. Getting better but still not good. Jama. 1995, 274: 645-651. 10.1001/jama.274.8.645.

    Article  CAS  PubMed  Google Scholar 

  44. Altman DG: Practical statistics for medical research. 1991, London, Chapman & Hall

    Google Scholar 

  45. Assmann SF, Pocock SJ, Enos LE, Kasten LE: Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet. 2000, 355: 1064-1069. 10.1016/S0140-6736(00)02039-0.

    Article  CAS  PubMed  Google Scholar 

  46. Pocock SJ, Assmann SE, Enos LE, Kasten LE: Subgroup analysis, covariate adjustment and baseline comparisons in clinical trial reporting: current practice and problems. Stat Med. 2002, 21: 2917-2930. 10.1002/sim.1296.

    Article  PubMed  Google Scholar 

  47. Brookes ST, Whitely E, Egger M, Smith GD, Mulheran PA, Peters TJ: Subgroup analyses in randomized trials: risks of subgroup-specific analyses; power and sample size for the interaction test. J Clin Epidemiol. 2004, 57: 229-236. 10.1016/j.jclinepi.2003.08.009.

    Article  PubMed  Google Scholar 

  48. Filly RA, Reddy SG, Nalbandian AB, Lu Y, Callen PW: Sonographic evaluation of liver nodularity: Inspection of deep versus superficial surfaces of the liver. J Clin Ultrasound. 2002, 30: 399-407. 10.1002/jcu.10095.

    Article  PubMed  Google Scholar 

  49. Medeiros FA, Zangwill LM, Bowd C, Sample PA, Weinreb RN: Use of progressive glaucomatous optic disk change as the reference standard for evaluation of diagnostic tests in glaucoma. Am J Ophthalmol. 2005, 139: 1010-1018. 10.1016/j.ajo.2004.08.069.

    Article  PubMed  Google Scholar 

Pre-publication history

Download references

Acknowledgements

The authors thank Beatrix Cochand-Priollet and Patricia de Cremoux for their helpful comments on a previous draft of this manuscript.

Author information

Authors and Affiliations

Authors

Consortia

Corresponding author

Correspondence to Joël Coste.

Additional information

Competing interests

The author(s) declare that they have no competing interests.

Authors' contributions

JC conceived the study and its design, obtained funding source, provided study material, and administrative, technical, or logistic support. CE performed the statistical analysis and drafted the manuscript. All authors read and approved the final manuscript.

Electronic supplementary material

12874_2007_243_MOESM1_ESM.doc

Additional File 1: Logistic regression models for sensitivity, specificity and likelihood ratios of tests. The table presents Coughlin et al.'s model for sensitivity and specificity and Janssens et al.'s model for likelihood ratios (computation of indices, signification of main effects and signification of interactions). (DOC 91 KB)

12874_2007_243_MOESM2_ESM.doc

Additional File 2: Clinical reading: final multivariable regression model for sensitivity and specificity. The table presents the coefficients of the final regression model (Coughlin et al.'s model) and the method to calculate sensitivity and specificity from these coefficients. (DOC 31 KB)

12874_2007_243_MOESM3_ESM.doc

Additional File 3: Clinical reading: final multivariable regression model for the likelihood ratios. The table presents the coefficients of the final regression models (Janssens et al.'s models) and the method to calculate likelihood ratios from these coefficients. (DOC 37 KB)

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Elie, C., Coste, J. & the French Society of Clinical Cytology Study Group. A methodological framework to distinguish spectrum effects from spectrum biases and to assess diagnostic and screening test accuracy for patient populations: Application to the Papanicolaou cervical cancer smear test. BMC Med Res Methodol 8, 7 (2008). https://doi.org/10.1186/1471-2288-8-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1471-2288-8-7

Keywords