Comparing breast cancer mortality rates before-and-after a change in availability of screening in different regions: Extension of the paired availability design

Background In recent years there has been increased interest in evaluating breast cancer screening using data from before-and-after studies in multiple geographic regions. One approach, not previously mentioned, is the paired availability design. The paired availability design was developed to evaluate the effect of medical interventions by comparing changes in outcomes before and after a change in the availability of an intervention in various locations. A simple potential outcomes model yields estimates of efficacy, the effect of receiving the intervention, as opposed to effectiveness, the effect of changing the availability of the intervention. By combining estimates of efficacy rather than effectiveness, the paired availability design avoids confounding due to different fractions of subjects receiving the interventions at different locations. The original formulation involved short-term outcomes; the challenge here is accommodating long-term outcomes. Methods The outcome is incident breast cancer deaths in a time period, which are breast cancer deaths that were diagnosed in the same time period. We considered the plausibility of the basic five assumptions of the paired availability design and propose a novel analysis to accommodate likely violations of the assumption of stable screening effects. Results We applied the paired availability design to data on breast cancer screening from six counties in Sweden. The estimated yearly change in incident breast cancer deaths per 100,000 persons ages 40–69 (in most counties) due to receipt of screening (among the relevant type of subject in the potential outcomes model) was -9 with 95% confidence interval (-14, -4) or (-14, -5), depending on the sensitivity analysis. Conclusion In a realistic application, the extended paired availability design yielded reasonably precise confidence intervals for the effect of receiving screening on the rate of incident breast cancer death. Although the assumption of stable preferences may be questionable, its impact will be small if there is little screening in the first time period. However, estimates may be substantially confounded by improvements in systemic therapy over time. Therefore the results should be interpreted with care.


Background
The paired availability design is a study design and method of analysis that reduces selection bias when using data from historical controls [1][2][3][4]. With standard historical controls, one compares (i) outcomes in subjects in the current time period who received treatment with (ii) outcomes in subjects in an earlier time period who did not receive treatment. Because of self selection (e.g., less healthy subjects might be more likely to receive treatment than more healthy subjects), results from standard historical controls may be substantially biased. In contrast, in the paired availability design there is no self-selection bias because a comparison is made between (i) outcomes in all subjects in the current time period when the intervention is more widely available and (ii) outcomes in all subjects in the previous time period when intervention was less widely available. To account for the change in availability of intervention between the current and previous time periods, Baker and Lindeman [1] proposed a potential outcomes model based on the intervention subjects would have received had they entered the study in a different time period. The model makes it possible to estimate efficacy, the effect of intervention among the type of subjects who would have only received the intervention during a period of increased availability, as opposed to effectiveness, the effect of a change in availability. The model requires various assumptions, best described in [4], that are plausible in many situations.
Estimating efficacy, as opposed to estimating effectiveness, is important when combining estimates from different locations (hospitals or regions). If the fraction of subjects who receive intervention differs among locations, it is difficult to interpret the overall estimate of effectiveness. In contrast, the overall estimate of efficacy is not confounded by varying the fraction of subjects who receive intervention in different locations.
Heretofore the paired availability design has only been formulated for evaluating the effect of an intervention on a short-term endpoint, namely the effect of epidural analgesia on the probability of Caesarian section [1][2][3][4]. To extend the paired availability design to breast cancer screening, we need to consider the implications of longterm endpoints.

Basic requirements
The first step in extending the paired availability design to the evaluation of breast cancer screening is to identify various geographic regions with a change in the availability of breast cancer screening from time period 0 to time period 1. To simplify this discussion, we presume that screening is more available in time period 1 than time period 0. The methodology is also applicable in the unlikely situation in which the reverse were true in some or all regions. The change in availability is a change in the fraction of the eligible population who are invited for screening. Following Duffy et al [5], there are three basic design requirements to which we have added a fourth.

Requirement 1
The time periods should be sufficiently long to give screening sufficient time to maximize (or almost maximize) its impact on breast cancer mortality rates.

Requirement 2
For each geographic region, time periods 0 and 1 should be the same length.

Requirement 3
The outcome in each time period is incident breast cancer deaths, namely deaths from breast cancer in the specified time period arising from diagnosis of breast cancer during the same time period.

Requirement 4
We consider only situations in which most screening occurs at regular intervals of the same length during each time period.
Requirement 1 can be relaxed in the special case when the two time periods are separated by a time interval. In that case one need only require that non-overlapping observation times from the start of each time period past the end of each time period be sufficiently long to maximize the impact on breast cancer screening. However, as with randomized trials, if follow-up after the last screening is too long, there could be considerable dilution from breast cancers that could not have benefited from screening, and that would reduce the efficiency of the estimates [6].
The rationale for Requirement 2 is that one wants the breast cancer mortality rates to be the same in the two time periods if screening has no effect and if there are no time-varying changes that could confound the results.
The rationale for Requirement 3 is that by using incident breast cancer deaths (instead of all breast cancer deaths) as an outcome measure, one can avoid dilution from the breast cancer deaths that could not have benefited from screening [5,[7][8][9][10]. However when using incident breast cancer deaths instead of all breast cancer deaths, there will be a preferential selection for screening evaluation of those breast cancers that cause death soon after diagnosis (as these deaths are more likely to occur in the same time period as diagnosis). With time periods substantially longer than the mean time between breast cancer diagnosis and death, this preferential effect is mitigated, as breast cancers occurring in a larger fraction of the time period (as compared to the situation with short time periods) have a greater potential to cause death a long time after diagnosis in the absence of screening and still be included in the evaluation. Nevertheless, it is worth bearing this preferential selection in mind.
The rationale for Requirement 4 is that the screening intervention must be comparable in the two time periods.

Potential outcomes model
For each before-and-after geographic region, our goal is to estimate the efficacy of breast cancer screening, which we define as the change in average yearly probability of incident breast cancer deaths due to the receipt of screening. (We later discuss combining estimates over all regions.) As proposed in the paired availability design [1][2][3][4], we use the following thought experiment to set the groundwork for estimating efficacy. Under this thought experiment, there are four types of subjects: A, always-receivers, who would receive screening in either time period, C, consistent-receivers, who would not receive screening in the time period with less availability and would receive it in the time period with greater availability, I, inconsistent-receivers, who would receive screening in the time period with less availability and would not receive it in the time period with greater availability, N, never-receivers, who would not receive screening in either time period.
For the sake of simplicity, we assume two conditions: (1) all-or-none behavior (i.e. an individual either receives all screens at the recommended interval or none, but does not switch back and forth), and (2) there is a single dominant screening test rather than a choice among screening tests of varying efficacy. In our application, there was only one screening modality.
Let π iAz , π iCz , π iIz , and π iNz denote the probabilities of subject types A, C, I, N, respectively, in region i and time period z. Let β iAz , β iCz , β iIz and β iNz denote the probability of incident breast cancer death in time period z and region i, for subject types A, C, I, and N, respectively. The probability of incident breast cancer death in each time period is a mixture of the probabilities over all subject types in each time period, θ i0 = π iN0 β iN0 + π iC0 β iC0 + π iI0 β iI0 + π iA0 β iA0 , for time period 0, As with the standard paired availability design, to ensure identifiably we restrict the estimation of efficacy to type C subjects. Let T i denote the length of follow-up for time periods 0 and 1 for region i. We define the efficacy (for type C subjects) in region i as The probability in (2) differs from a naive comparison of the effect of screening between subjects who receive screening in time period 1 and subjects who do not receive screening in time period 1. Instead ∆ i is the effect of receiving screening among type C subjects. Related potential outcome models were independently formulated for randomized trials with all-or-none compliance [11,12].

Assumptions
In order to estimate (2) we require the following assumptions adapted from the standard paired availability design [4].

Assumption 1. (Stable population)
The characteristics of the population that affect the probability of incident breast cancer death are constant over time.

Assumption 2. (Stable treatment)
The screening modality and therapy following diagnosis do not change over time.

Assumption 3. (Stable evaluation)
The outcome measure, which is incident cancer breast deaths, does not change in definition over time.

Assumption 4. (Stable preferences)
Factors affecting the decision to receive screening do not change over time.

Assumption 5. (Stable screening effects)
The effect of screening on the probability of incident breast cancer death rates does not change over time.
Assumption 1 is plausible if there is little immigration or out-migration related to screening. However, substantial immigration or out-migration of subpopulations with different underlying health or cancer risk can affect results over long time periods.
Assumption 2 is problematic in evaluating some screening modalities because the reduction in incident breast cancer deaths could result from better systemic therapies, such as chemotherapy and hormonal therapy [13]. It is possible that these changes in therapy could explain the decrease in cancer mortality rates over time, even if screening has no benefit. Therefore it is particularly important to consider the plausibility of Assumption 2.
Assumption 3 is plausible, absent major changes in death code systems.
The basic idea of Assumption 4 is that the screening intervention (including any campaigns to increase public awareness) should be the same in both time periods. If Assumptions 1-4 hold, the probability of each subject type does not change over time period. In other words π iAz = π iA , π iCz = π iC , π iIz = π iI , and π iNz = π iN . In addition, by virtue of Assumption 4, there are basically no inconsistent receivers, i.e. π iI = 0. If there is no screening in time period 0, so π iaz = π iCz = 0, and Assumption 4 only requires π iCz = π iC and π iNz = π iN , which is very plausible especially if one views public awareness as part of the screening intervention.
Assumption 5 likely holds for type N subjects because the same prior history of no screening applies to both time periods 0 and 1. Thus we can reasonably assume that the probability of incident breast cancer death among type N subjects does not depend on time period, i.e., β iN0 = β iN1 ≡ β N . However, unless there is no screening in time period 0, Assumption 5 will not hold for type A subjects. The reason is that (i) screening is generally more available before time period 1 than before time period 0, and (ii) prior screening may affect the probability of incident cancer death if screening confers benefit.
As a consequence of the above assumptions (and not applying Assumption 5 to type A subjects), we can write (1) as (3) If Assumption 5 held for type A subjects, as in the usual paired availability design, β iA1 = β iA0 , and we would obtain the standard formula, averaged over the duration of the time period, for efficacy in the paired availability design, ∆ i = (θ i1 -θ i0 )/(π iC T i ). We would also obtain the standard formula if there were no screening in time period 0 (and thus no type A subjects).

Estimates
In order to estimate (4) we need to estimate θ iz , π iC , π iA , and β iAz . Following the standard paired availability design we can estimate the first three parameters as follows. Let s = 1 if screening was received during the time period and 0 otherwise. Following Requirement 4, we assume most screening occurs at regular intervals during the time period. Let y = 1 if incident breast cancer death, and 0 otherwise.
In the ideal scenario (Scenario I) the investigators would report data n izsy , which is the number of subjects in region i and time period z with indicator of receipt of screening s and outcome y. In the typical scenario (Scenario II), the only data in published reports are the numbers who received or did not receive screening n izs+ and the numbers with a given outcome y but unknown screening status n iz+y , where "+" denotes summation over the indicated subscript. For both scenarios, we obtain the following estimates, iz = n iz+1 /n iz++ = fraction of subjects in time period z with incident cancer death (6) iA = n i01+ /n i0++ = fraction who received screening in time period 0, (7) iC = n i11+ /n i1++ -n i01+ /n i0++ = fraction who received screening in time period 1 (a combination of types A and C) minus fraction who received screening in time period 0 (type A). (8) If we had the full data n izsy , we could estimate β iA0 . However because subjects in time period 1 who receive screening are a combination of types C and A, we cannot uniquely estimate β iA1 . We discuss how to circumvent this difficulty in the two scenarios.

Scenario I: Full reporting of data
When there are full reporting of data, we can estimate θ izs = pr(Y = 1|i, z, s) by izs = n izs1 /n izs+ . Under the potential outcomes model, we write We introduce an exogenous parameter h, which is the relative risk for incident breast cancer death among type A subjects in time period 1 versus time period 0, namely, θ π π θ We discuss specification of h in the section below on leadtime adjustment. Combining (9), (10) and (2) gives The asymptotic variance is approximately Scenario II. Limited reporting of data With limited reporting of data we introduce a second exogenous parameter k to essentially create the same estimates as with the full reporting of data. In particular we write where k = pr(S = 1 | Y = 1, Z = 0) = fraction of incident cancer deaths in time period 1 that are attributed to screening. If the full data were available, we would have an estimate of k and the methodology would be equivalent to that for Scenario I. In the absence of reported data, we propose a sensitivity analysis for k. A lower bound on k is 0 and an upper bound, assuming screening does not cause cancer deaths, is an estimate of the fraction screened in time period zero, namely pr(S = 1|Z = 0). Substituting (13) into (11) gives where h is the same as in (10). We approximate the asymptotic variance by Using actual reported data from the limited data scenario, we checked the approximate variances in (12) and (15) by making reasonable assumptions to impute n izsy and then also computed the asymptotic variance using the delta method. The agreement was excellent: using the data in the example and assuming relative risk of incident cancer death of .7 for screened versus not screened, the approximate and exact asymptotic variance agreed to three significant digits.

Lead time adjustment related to prior screening
We specify a value for h in (10) (10) is the ratio of type A incident cancer deaths in time period 1 to type A incident cancer deaths in time period 0. We approximate h by the number of subjects in subgroup (ii) divided by the number of subjects in the combination of subgroups (i) and (ii). Let L denote the mean lead time, which is approximately 2 years for breast cancer screening [14]. Subjects in subgroup (i) are, on average, detected on screening in the last L years of the time period. Assuming uniform detection rates, we further approximate h by T i , the length of time of screen-detection in subgroup (ii), divided by T i + L, the average length of time of screen detection in the combination of subgroups (i) and (ii), giving h ≈ T i /(T i + L). Importantly, if L is short relative to T i there will be little bias as h would approximately equal one.

Lead time adjustment related to age-range at diagnosis
Because incident cancer cases are defined based on age at diagnosis, there is also a subtle bias [9]  in time period 1. Also a fraction (10 + L)/10 type C subjects in the 60-69 age group in time period 0 would not be counted toward incident cancer deaths in time period 0, but would be counted toward incident cancer deaths in time period 1. Consequently, for this adjustment, we mul- and D a is five-year cumulative mortality following breast cancer diagnosis at age a. With L = 2 [14] and approximating D 40 = .167, D 50 = .131, and D 60 = .124, based on US population data [15], we obtain b = .98. In our example, the effect of lead-time bias due to a specified age-range is negligible and is therefore ignored.

Combining estimates over regions
To obtain a combined estimate of efficacy over all regions, we use a simple random effects meta-analysis [4,16,17].
Let w i = 1/ var( ) and let σ 2 = the larger of (Q -(r -1)) / The random-effects weights are , and the summary statistic is , with standard error . The 95% confidence interval is ( -t r-1 se( ), + t k-1 se( )), where t r-1 is the value of the 97 1/ 2 percentile of a t-distribution with r -1 degrees of freedom, where r is the number of regions. For an example of these calculations, see Table 1.

Results
We applied the methodology to before-and after-data on breast cancer screening in various Swedish counties [5]. The original data involved 7 counties. However, because one-third of the population of Dalarna county enrolled in a randomized trial, some of the assumptions might be violated for women screened in Dalarna county. The main difficulty with using data from Dalarna county is that subjects who refused screening in the randomized trial may not be comparable to subjects outside of the trial who did not obtain screening due to lack of availability, and the methodology does not allow for this difference in comparability. Therefore, in our analysis, we dropped data from Dalarna county. The age ranges were 40-74 for three counties, 40-69 for two counties, and 50-59 for one county.
The data in [5] were reported in terms of person years of receiving screening. Dividing person years by the length of the time period we obtained the approximate number of persons eligible for screening in each region and group, n iz++ . Using these data, we estimated the change in the average yearly death rate of incident breast cancer among type C subjects ages 40-69 due to receipt of breast cancer screening as -9 per 100,000 with 95% confidence interval of (-14, -4) per 100,000 for k = 0 and similarly -9 per 100,000 with 95% confidence interval of (-14, -5) per 100,000 when k equaled the fraction screened. See Table  1 and Figure 1. The estimates were similar for the two values of k because only in Vastmanland County was there substantial screening in time period 0, and that was only 14%. We caution that Assumption 2 may not hold due to improvements in available systemic therapy over the periods of interest [14]. Therefore the results must be interpreted with caution, as they may overestimate the benefit of screening.  Fraction screened is from Table 1 of [5]. Incident breast cancer deaths are from Table 3 of [5]. Number eligible is person years from Table 3 of [5] divided by years in time period from Table 2 of [5]. Calculations assume k = 0 for sensitivity analysis. Above yields Q = 3.72, σ 2 = 0, and overall estimate of -9 with 95% confidence (-14, -4) per 100,000.

Discussion
Our methodology complements that of Duffy et al. [5], who obtained qualitatively similar results (i.e. a statistically significant reduction in breast cancer mortality rates in time period 1 versus time period 0) based on data from seven Swedish counties. Unlike our paper, Duffy et al. estimated relative risks instead of risk differences. Duffy et al. [5] fit a Poisson regression model to data from all subjects in both time periods as well as data on those screened and not screened in the current time period. To adjust for self-selection bias, Duffy et al. [5] fit a separate model to data from refusers and participants in a randomized trial in Dalarna County. An implicit assumption is that the self-selection adjustment based on refusers in the randomized trial would apply to women who did not receive screening outside the trial. This assumption is not used in the paired availability design, which does not use data from a randomized trial. However the paired availability design is subject to bias from changes in therapy over Estimated change (and 95% confidence intervals) in average yearly probability of incident cancer death due to receipt of screen-ing per 100,000 type C subjects ages 40-69 Another approach to the analysis of before-and after-cancer screening data is to regress the change in cancer mortality rate over time for each region on the change in screening rates over time in each region. Sometimes the change in cancer incidence rates is used as a proxy for the change in screening rates [18]. This approach is an attempt to extract more information from the data, because larger changes in cancer screening rates should ideally correspond to larger changes in cancer mortality rates. However this type of regression based on population-level data can give different results than a regression based on individual-level data, a phenomenon known as the ecologic fallacy [19].
With additional data, it may be possible to adjust for the effect of changes in therapy over time, if an additional assumption is reasonable. Suppose we had additional data on incident cancer deaths in time periods 0 and 1 in regions in which there was no screening in either time period. If (i) the therapies in these regions are representative of the therapies in the regions with screening and (ii) the population characteristics are similar to the population characteristics in regions with screening, one could reasonably estimate the effect of changes in therapy on the probability of incident cancer death.
Although data were reported on person-years of eligibility for screening, we did not use a survival analysis. A survival analysis can be incorporated into the potential outcomes model for all-or-none compliance [20]. However such an analysis requires more data than were reported. Also, in this framework, even a constant hazard model would involve a complicated likelihood calculation.
Besides this methodology based on the paired availability design, one could also analyze observational screening data using the method of periodic screening evaluation [21]. However this method requires regular screenings, data on the number of cancers detected on screening and in the intervals between screenings, and follow-up of subjects detected with cancer. The definitive method to evaluate cancer screening is a randomized trial [6]. Observational approaches have a role because such trials are very expensive and difficult to implement. Thus this extension of the paired availability design to the evaluation of cancer screening could play an important role in cancer screening evaluation, but only if there were no change in therapy over time or if one could adequately adjust for any effect of a change in therapy over time.

Conclusion
The paired availability design can be extended to the evaluation of breast cancer screening by using incident breast cancer deaths as the outcome and requiring sufficiently long equal-length time periods before and after a change in availability of periodic screening. However the assumptions should be examined carefully. The assumption of stable preferences may be violated by a campaign to encourage screening, although the impact would be greatly mitigated if there were little screening in time period 0. Also the assumptions regarding changes in therapy over time may also be violated.
Publish with Bio Med Central and every scientist can read your work free of charge