This article has Open Peer Review reports available.
Estimating the cumulative risk of false positive cancer screenings
© Baker et al; licensee BioMed Central Ltd. 2003
Received: 11 February 2003
Accepted: 03 July 2003
Published: 03 July 2003
When evaluating cancer screening it is important to estimate the cumulative risk of false positives from periodic screening. Because the data typically come from studies in which the number of screenings varies by subject, estimation must take into account dropouts. A previous approach to estimate the probability of at least one false positive in n screenings unrealistically assumed that the probability of dropout does not depend on prior false positives.
By redefining the random variables, we obviate the unrealistic dropout assumption. We also propose a relatively simple logistic regression and extend estimation to the expected number of false positives in n screenings.
We illustrate our methodology using data from women ages 40 to 64 who received up to four annual breast cancer screenings in the Health Insurance Program of Greater New York study, which began in 1963. Covariates were age, time since previous screening, screening number, and whether or not a previous false positive occurred. Defining a false positive as an unnecessary biopsy, the only statistically significant covariate was whether or not a previous false positive occurred. Because the effect of screening number was not statistically significant, extrapolation beyond 4 screenings was reasonable. The estimated mean number of unnecessary biopsies in 10 years per woman screened is .11 with 95% confidence interval of (.10, .12). Defining a false positive as an unnecessary work-up, all the covariates were statistically significant and the estimated mean number of unnecessary work-ups in 4 years per woman screened is .34 with 95% confidence interval (.32, .36).
Using data from multiple cancer screenings with dropouts, and allowing dropout to depend on previous history of false positives, we propose a logistic regression model to estimate both the probability of at least one false positive and the expected number of false positives associated with n cancer screenings. The methodology can be used for both informed decision making at the individual level, as well as planning of health services.
When evaluating cancer screening, it is important to estimate both the benefits and harms. The major benefit is the reduction in mortality from the cancer that is the object of the screening . A frequent harm is a false positive (FP) screening outcome. Although there are also risks from the screening procedure, such as perforation of the colon in colorectal cancer screening using endoscopy, diagnosis of medically unimportant "cancers" (overdiagnosis) and burdens due to time lost and travel to the screening clinic, these will not be discussed here. A false positive (FP) can be defined either narrowly as an unnecessary biopsy (i.e. a biopsy that does not detect cancer) or broadly as unnecessary additional work-up (i.e. an additional work-up that does not detect cancer).
Gelfand and Wang (GW) [2–4] proposed methodology for estimating the probability of a least one FP in n screenings. We bolster and extend the methodology in GW. First, we show that an unrealistic assumption of GW is unnecessary. GW thought that they needed to assume that dropping out (either by loss-to-follow-up or refusing additional screenings) is independent of the prior history of false positives. This assumption is unrealistic  and makes their approach untenable. We show that by reformulating the problem with different random variables, one can obtain essentially the same result without the unrealistic assumption. Second we estimate an additional quantity to that estimated by GW. GW only estimated the probability of at least one false positive in n screenings. To better quantify the cumulative burden of false positives, we also estimated the expected number of false positives in n screenings. Third, to simplify computations for some data sets, we introduced a logistic regression model.
Obviating the Unrealistic Dropout Assumption
The proof obviating the assumption that dropout does not depend on previous FP's is technical and deferred to the Appendix (see Additional file: 1). However the main idea can be readily summarized. Unlike GW who estimate the probability of at least one false positive among n screenings that have occurred, we estimate the probability of a least one false positive if there were n screenings regardless of whether or not they occurred. This seemingly slight modification of the definition makes a large difference in the mathematical derivation, which in turn obviates the unrealistic assumption. Our much less restrictive assumption is that dropout does not depend on future false positives. This revised formulation has a parallel in Kaplan-Meier estimation that requires only that censoring not depend on future outcomes  and in discrete-time censoring models that are formulated as missing-data selection models .
To estimate parameters, GW used a Bayesian approach with a proportional hazards model. For the clinically oriented reader, the computations can be difficult. As an alternative we propose a relatively simple logistic regression models that is appropriate for some data sets (with further elaboration in the Discussion). Our approach requires fitting two logistic regressions. The first logistic regression models the probability of FP on the first screening as a function of age. We let i index age interval at screening, where i = 1, 2, 3,4, 5 corresponds to ages 40–44, 45–49, 50–54, 55–59, 60–64, respectively. The logistic regression can be written as
logit(pr(FP|i;α) = α0 + α age(i), (1)
where α age(1) = 0 because we have constant term α0. The data are a table of counts for age categories cross-classified by FP outcome (yes or no). See supplemental file.
The second logistic regression models the probability of FP on a screening after the first as a function of age at screening, time since the last screening, the number of the screening, and whether or not there was a previous FP. To obtain a parsimonious model we have made two simplifications. First we use screening number rather than chronological time. For example, in one subject screening might occur at times 0, 1, and 3, and in another subject, screening might occur at times 0, 1, 2. In terms of the model, both subjects have three screenings indexed by t = 1, 2, 3. This has the advantage of reducing the number of missing-data patterns as there are, by definition, no missing "between" screenings. To model various patterns of missingness would require a much more complicated formulation, which is not warranted given the sparse data. To account for different intervals between screenings we included time since the previous screenings as a covariate (which parallels the GW formulation). Our second simplification is using an indicator of previous FP rather than a more detailed history of FP because there are too few data to adequately fit a model that conditions on various prior patterns of FP.
As before we let i denote age interval at screening. We also let j denote time since the last screening, where j = 1,2,3,4 corresponds to 9–12 months, 13–15 months, 16–18 months, and greater than 18 months, respectively. We let k denote whether or not there was a previous false positive, where k = 0, 1 corresponds to no and yes, respectively. The logistic regression is
logit(pr(FP|i, j, t, k;β)) = β0 + β age(i) + β time(j) + β screen(t) + β FP(k). (2)
where β age(1) = β time(1) = β screen(1) = β FP(1) = 0. The data are counts for a cross classification of age interval, time interval, screening number, an indicator of previous FP, and the FP outcome. See Additional file: 2. The model in (2) represents a standard application of logistic regression to survival analysis [8, 9].
Estimating Cumulative Risk
We use the parameter estimates from the logistic regression to estimate the cumulative risk of an FP. Let
The estimated survival time until the first FP in n screenings (for n > 1) is 1 - p ij (n).
To better quantify the cumulative burden of FP's we also estimate the expected number of FP's. The formula for the estimated expected number of FP's in n screenings varies with n. For example, for n = 4 the estimated expected number of FP's is
e ij (4) = pr(4FP's) 4 + pr(3 FP's) 3 + pr(2 FP's) 2 + pr(1 FP) 1, (7)
pr(4 FP's) = a 1i q 3|3ij
pr(3 FP's) = a 1i q 2|3ij + (1 - a 1i ) a 2i q 2|2ij
pr(2 FP's) = a 1i q 1|3ij + (1 - a 1i ) a 2i q 1|2ij + (1 - a 1i ) (1 - a 2ij ) a 3ij q 1|1ij
pr(1 FP) = a 1i q 0|3ij + (1 - a 1i ) a 2i q 0|2ij + (1 - a 1i ) (1 - a 2ij ) a 3ij q 0|1ij + (1 - a 1i )(1 - a 2ij ) (1 - a 3ij ) a 4ij ,
and q h|fij is the probability of h FP's over the last f time periods for a subject age i and with time interval j since the last screening. In this example
q 3|3ij = pr(3 FP's in last 3 screenings) = b 2ij b 3ij b 4ij ,
q 2|3ij = pr(2 FP's in last 3 screenings) = b 2ij b 3ij (1 - b 4ij ) + b 2ij (1 - b 3ij ) b 4ij + (1 - b 2ij )b 3ij b 4ij ,
q 1|3ij = pr(1 FP in last 3 screenings) = b 2ij (1 - b 3ij ) (1 - b 4ij ) + (1 - b 2ij ) b 3ij (1 - b 4ij ) + (1 - b 2ij ) (1 - b 3ij ) b 4ij ,
q 0|3ij = pr(0 FP's in last 3 screenings) = (1 - b 2ij ) (1 - b 3ij ) (1 - b 4ij ),
q 2|2ij = pr(2 FP's in last 2 screenings) = b 3ij b 4ij ,
q 1|2ij = pr(1 FP in last 2 screenings) = b 3ij (1 - b 4ij ) + (1 - b 3ij ) b 4ij ,
q 0|2ij = pr(0 FP's in last 2 screenings) = (1 - b 3ij ) (1 - b 4ij ),
q 1|1ij = pr(1 FP in last screening) = b 4ij ,
q 0|1ij = pr(0 FP's in last screening) = 1 - b 4ij .
An important special case occurs when the probabilities of FP do not vary with screening number. This case is important because it allows extrapolation to additional screenings. With a 2ij = b tij0 and b ij = b tij1 for t = 2,3,...,n, the estimated probability of at least one FP in n screenings is
p ij (n) = 1 - (1 - a 1i ) (1 - a 2ij ) n , (8)
and the estimated expected number of FP's in n screenings is
The most difficult part of implementation is computing the variance. The asymptotic variances are
where θ = (α0, α age(i), β0, β age(i), β time(j), β screen(t), β FP(k)). By using computer software for symbolic derivatives , it is not hard to compute (8). Alternatively one could compute confidence intervals by using a bootstrap approach .
We applied the methodology to data on 4 annual screenings in the Health Insurance Program of Greater New York (HIP) breast cancer screening study . Starting in 1963, approximately 60,000 women were randomly assigned to either a study group invited for four annual mammograms and physical examinations or to a control group that received no screening within the study. Approximately 1/3 of the subjects in the study group refused the first screening and received no screenings. Our analysis focused on the vast majority of screened women who were between ages 40 and 64. In HIP, our broad definition of a false positive was either (i) breast biopsy in which no cancer was detected or (ii) early re-examination with either clinical or radiological recommendation and no cancer detected on diagnostic work-up. Our narrow definition was (i).
Parameter estimates (standard errors) from logistic regression models for FP's
model for initial FP
α age (2)
α age (3)
α age (4)
α age (5)
model for subsequent FP's
β age (2)
β age (3)
β age (4)
β age (5)
β time (2)
β time (3)
β time (4)
β screen (3)
β screen (4)
Question 1. "If I were to have n number of screening tests and stick to the schedule what are the chances that at least one will be a false positive?"
Question 2. How many unnecessary biopsies or work-ups am I likely to need if I start on the screening program and adhere to the schedule?"
Question is answered by the estimates in (6) and (8). Question 2 is answered by estimates in (7) and (9). For an economic analysis Question 2 is particularly useful as it would help an analyst assign monetary costs to the cumulative burden of FP's. Both questions are clearly important to the patient.
Defining a FP as an unnecessary biopsy, we could not reject a model for (1) with only a constant (deviance = .27 on 4 d.f, p = .99) nor a model for (2) with a constant and a parameter for previous FP's (deviance = 3.66 on 9 d.f., p = .93). Consequently we think it is reasonable to extrapolate beyond 4 screenings using (8) and (9). In answer to Question 1, the estimated probability of at least one FP in 10 screenings is .08 with 95% confidence interval of (.07, .09). In answer to Question 2, the estimated expected number of FP's in 10 screenings is .11 with 95% confidence interval of (.10, .12).
Defining an FP as an unnecessary work-up, most of the parameter estimates were statistically significant, so we kept all the covariates in the model. Because the parameter for screening number was included, we could not extrapolate beyond the 4 screenings that was the maximum number of screenings per subject in our data. For purposes of illustration we selected i = 1 and j = 1 for computing (6) and (7). In answer to Question 1, the estimated probability of at least one FP in 4 screenings is .21 with 95% confidence interval of (.20, .22). In answer to Question 2, the estimated expected number of FP's in 4 screenings is .34 with 95% confidence interval of (.32, .34).
As an ancillary investigation, we also fit a logistic regression for the probability of dropout as a function of age category, time interval since last screening, screening number, false positive on the last screening, false positive on an earlier screening, and interaction between of the two false positive variables. For the HIP data, when FP was defined broadly, there was no statistically significant association between FP history and dropout. For FP defined narrowly, there was a strong association between FP history and dropout.
Our methodology is applicable to any screening test recommended on a periodic basis for which data come from subjects with possibly different numbers of screenings. Ideally one would like data from a study in which subjects are representative of the general eligible population and clinicians are representative of the clinicians who would perform the screening in practice. Particularly when the FP is an unnecessary work-up, the clinicians may vary in the threshold used to determine a positive, as there is subjectivity to the interpretation of the test. When FP is an unnecessary biopsy, the variation among clinicians will likely be small because a high level of FP's is unacceptable . In our data set there was no information on clinician. If there are data on clinicians, it should be incorporated into the analysis. If the number of clinicians is small, we suggest including a variable for clinician in the logistic regression. If the number of clinicians is larger, it is best to include a random variable for the effect of clinician. Unfortunately the simple logistic regression is not applicable and a more complicated model such as that in GW would be needed.
The assumption that dropout does not depend on future false positives could be violated if a subject drops out because of self examination results (so she goes to her regular physician) that would have led to future false positives. To avoid this violation of the assumption, one could ask women screened and women who dropped out if they found any lump on self examination. By including a covariate for lump on self examination, the dropout process depends on previous history and factors from the likelihood.
We made three contributions. First we showed that previous methodology of GW did not require an unrealistic assumption about the dropout process. This makes the approach much more appealing. Second, we showed how to estimate the expected number of false positives, which we think is informative, in addition to the probability of at least one false positive. Third we presented a logistic regression formulation that is applicable for some data sets and is relatively simple to implement. Our approach can be applied to many types of cancer screening tests that are recommended on a periodic basis. It is useful for both advising individuals in a clinical setting and for health resources planning.
We thank Jian-Lun Xu, Victor Kipnis, and Philip Prorok for helpful comments.
- Baker SG, Kramer BS, Prorok PC: Statistical issues in randomized trials of cancer screening. BMC Medical Research Methodology. 2002, 2: 11-10.1186/1471-2288-2-11. [http://www.biomedcentral.com/1471-2288/2/11]View ArticlePubMedPubMed CentralGoogle Scholar
- Gelfand AE, Wang F: Modeling the cumulative risk for a false positive under repeated screening events. Statistics in Medicine. 2000, 19: 1865-79. 10.1002/1097-0258(20000730)19:14<1865::AID-SIM512>3.0.CO;2-M.View ArticlePubMedGoogle Scholar
- Elmore JG, Bargon MB, Moceri VM, Polk S, Arena PJ, Fletcher SW: Ten-year risk of false positive screening mammograms and clinical breast examinations. New England Journal of Medicine. 1998, 338: 1089-1096. 10.1056/NEJM199804163381601.View ArticlePubMedGoogle Scholar
- Christiansen CL, Wang F, Barton MB, Kreuter W, Elmore JG, Gelfand AE, Fletcher SW: Predicting the cumulative risk of false positive mammograms. Journal of the National Cancer Institute. 2000, 92: 1657-1666. 10.1093/jnci/92.20.1657.View ArticlePubMedGoogle Scholar
- McCann J, Stockton D, Godward S: Impact of false-positive mammography on subsequent screening attendance and risk of cancer. Breast Cancer Research. 2002, 4: R11-10.1186/bcr455.View ArticlePubMedPubMed CentralGoogle Scholar
- Kalbfleisch JD, Prentice RL: The Statistical Analysis of Failure Time Data. Wiley: New York. 1980Google Scholar
- Baker SG: Regression analysis of grouped survival data with incomplete covariates: nonignorable missing-data and censoring mechanisms. Biometrics. 1994, 50: 821-826.View ArticlePubMedGoogle Scholar
- Brown CC: On the use of indicator variables for studying the time-dependence of parameters in a response time model. Biometrics. 1975, 31: 863-872.View ArticlePubMedGoogle Scholar
- Efron B: Logistic regression, survival analysis, and the Kaplan-Meier curve. Journal of the American Statistical Association. 1988, 83: 414-425.View ArticleGoogle Scholar
- Wolfram S: The Mathematica Book. 1999, Champaign: Wolfram Media and Cambridge: Cambridge University Press, 4Google Scholar
- Efron B, Gong G: A leisurely look at the bootstrap, the jackknife, and cross-validation. American Statistician. 1983, 37: 36-48.Google Scholar
- Shapiro S, Venet W, Strax P, Venet L: Periodic Screening for Breast Cancer, The Health Insurance Plan Project and Its Sequelae, 1963–1986. Baltimore, Johns Hopkins University Press. 1988Google Scholar
- Baker SG: The Central Role of Receiver Operating Characteristic (ROC) Curves in Evaluating Tests for the Early Detection of Cancer. Journal of the National Cancer Institute. 2003, 95: 511-5. 10.1093/jnci/95.7.511.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/3/11/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.