In this paper, we discuss the bias that can arise in cancer screening trials due to incomplete disease status ascertainment in a test if negative series trial design. The design we considered was modeled closely after a recently completed and published trial by Lehman et al. . The goal of this trial was to assess the diagnostic yield of MRI over mammography. It was not to assess the diagnostic accuracy of MRI for comparison to other screening modalities. However, it is easy to take the results of the trial out of context. Other researchers may be tempted to cite their results as historic estimates of the diagnostic accuracy of MRI or emulate the test if negative trial design to estimate the diagnostic accuracy of Test 2. It is, therefore, important to explore the effects of the test if negative trial design on the estimates of the diagnostic accuracy of Test 2.
Although we modeled our design on real trials, we made simplifying assumptions. We assumed that biopsy was essentially infallible. In real cancer studies, even biopsy makes diagnostic errors. In addition, we assumed that no study participant would show signs and symptoms of disease, because they were receiving systemic therapy. In fact, recurrences of cancer and new primary cancers can occur even during chemotherapy and radiation.
We have been unable to find other research that simultaneously considers how conditioning and incomplete disease status ascertainment affect estimates of sensitivity and specificity. The majority of literature focuses on estimating the accuracy of a diagnostic program comprising several tests [7, 14–16]]. In contrast, we are interested in estimating the diagnostic accuracy of the in second test a series of two tests. Most authors also assume that the true disease status of each participant is known [7, 14–16]]. We do not make this assumption, as it is unlikely to be true in cancer screening trials.
Rutjes et al.  provides a thorough discussion of the pitfalls faced by clinicians when evaluating medical tests in the absence of a true gold standard. Whiting et al.  also catalogues biases that can occur in screening trials. Neither Rutjes et al.  nor Whiting et al.  discuss the additional effect of using a series screening trial design to estimate diagnostic accuracy.
Lehman et al.  point out that the estimated diagnostic accuracy of MRI is higher in their study than in other published studies. They posit that this could be due to advances in breast cancer screening technology and increased skill at analyzing imaging results. As noted in this paper and in the papers by Whiting et al.  and Rutjes et al. , biases resulting from trial design may also cause an inflation in the observed estimates of diagnostic accuracy. While the results of the trial conducted by Lehman et al.  may have been affected by differential verification bias, we suspect that the results were not affected by bias due to the conditionality of Test 2 (MRI) on the results of Test 1 (mammography). We give our rationale below.
The figures presented in the results section use parameters that are consistent with what we would expect for the trial conducted by Lehman et al. . Using the parameter values estimated from this trial and the formulae presented in this paper, we calculated the percent bias in the observed sensitivity and specificity for each trial design. The percent bias in the observed specificity of Test 2 relative to the true specificity is near zero. However, the percent bias in the observed sensitivity of Test 2 relative to the true sensitivity is 14% for the single test design and 12% for the series design. Since there is little difference between the single test and series designs, the detected upward bias is mainly due to differential verification of disease status, rather than the conditionality of MRI on the results of mammography.
In some circumstances, the test if negative trial design may be the best choice available, due to external constraints. An investigator can use the formulae presented in this paper to conduct a sensitivity analysis of their estimates of the diagnostic accuracy of Test 2. For the trial conducted by Lehman et al. , an example of this sort of sensitivity analysis is given in the immediately preceding paragraph. The investigator can choose a range of reasonable values for the disease prevalence, the proportion of participants who undergo an elective procedure, and the agreement between Test 1 and 2 results for cases, in order to place bounds on the amount of bias that may arise from their choice of study design. An investigator may be able to directly estimate the portion of bias due to differential verification by estimating the number of missing cases. This number can be estimated by looking at the number of participants who are determined to be cases out of those who tested negative on both tests and chose to undergo an elective procedure. In practice, as the percentage of subjects who choose an elective procedure is usually low, the stability of this estimate may be questionable.
Aside from the series trial design, there are two further characteristics of the trial conducted by Lehman an et al.  that should be noted. First, the results of the trial are presented per breast, rather than per lesion, which is more common [8, 12, 17]. Second, all of the participants in the trial had already developed cancer in one breast before being screened for cancer in the second breast. The development and treatment of cancer in that first breast will affect screening practices and treatment of the second breast. For example, when screening the contralateral breast, we noted that participants are less likely to show signs and symptoms during follow-up since they are undergoing systemic therapy for cancer in the first breast.
In this paper, we have shown that estimates of diagnostic accuracy for the second test in test if negative series screening trials with incomplete disease status ascertainment can be subject to bias. Glueck et al. , showed a similar bias in screening studies conducted in parallel. If both designs are flawed, what design should be adopted by researchers seeking to characterize screening modalities? The answer is unclear. Because screening trials affect the health of millions of people, methods for bias correction for both parallel and series screening trial designs are needed.