We sought to identify flags for invasive breast cancer with PPV ≥85% and, within that, the greatest sensitivity. Of the ascertainment flags examined from individual datasets, the flag meeting these criteria was hospital-derived ‘diagnosis of invasive breast cancer’. When compared with the gold-standard Cancer Registry this flag combination had a PPV and sensitivity both of 86%. In other words, 86% of the suspected cases identified by this flag were true positives, and 86% of the cases listed on the Cancer Registry during the study period were identified by this flag. The addition of flags from other Australian datasets (i.e. medical service, prescription claims and survey data) to these hospital-derived flags did not results in combinations with both PPV and sensitivity over 85%.
Researchers working with the combination of Australian medical service claims, pharmaceutical claims and self-reported data could most accurately identify cases of invasive breast cancer using the flag combination of ‘breast radiotherapy and a dispensed medicine’. Around 80% of cases identified by this flag were true cases, compared with the gold standard, and this flag identified 40% of the invasive breast cancers recorded on the Cancer Registry during the study period. Much higher sensitivity was achieved with the flag ‘breast radiotherapy or a dispensed medicine or self-reported diagnoses; however the corresponding PPV was poor (27%).
To our knowledge, this is the first study to examine the validity of multiple breast cancer flags from multiple datasets against an Australian State Cancer Registry. Such investigation is important due to the increasing use of administrative and self-reported data in epidemiological studies, and with the unavailability of Cancer Registry data in some jurisdictions. We have used health and medical records for a large, heterogeneous sample of women for whom all public and private inpatient diagnoses and surgeries, subsidised outpatient procedures and medicines have been captured.
Some limitations exist which may have implications for this study. This study was conducted as part of a larger program of research examining use of endocrine therapies for invasive breast cancer in Australian clinical practice. The data we requested from the Cancer Registry were therefore restricted to invasive breast cancer and did not include records for ductal carcinoma in situ (DCIS). We were therefore unable to determine how often false positive flags were picking up genuine cases of DCIS and how many were unrelated to breast cancer of any kind. We examined the validity of various breast cancer flags for women in the 45 and Up Study who, by definition, are aged 45 years and over and have consented to their health records being used for research purposes. The health service use of these women may differ from younger women with breast cancer, or women who do not agree to participate in cohort studies. Therefore, the PPV, sensitivity and specificity calculated here for various flags may differ from those that would be found in whole-of-population studies. The validity of the flags examined here are impacted by the proportion of women who move out of NSW between diagnosis and treatment, as well as those dying prior to treatment or declining treatment. It may also be that the validity of the breast cancer flags examined here will change over time in response to changes in health service use and medical advancement.
Each of the flags we examined had very high specificity, which is to be expected given the low prevalence of breast cancer within the cohort (1.4%). In such a scenario, even a model which predicted no breast cancer at all would retain high specificity. Therefore, it is important to examine the PPV and sensitivity of all predictors. The optimum method for identifying cases of breast cancer without access to a Cancer Registry will depend on the type and number of datasets available and the reason cases need to be identified. Researchers seeking to exclude possible cases of breast cancer from their datasets will be most concerned with the specificity of breast cancer flags. All of the breast cancer flags we examined in this study, whether derived from individual or multiple datasets, had high specificity (>97.5%). Each of these would be suitable for identifying non-cases with high accuracy. Researchers wishing to identify any suspected cases of breast cancer for situations where some false positives are acceptable, such as risk adjustment, would likely prioritise flags with high sensitivity. In contrast, PPV would likely be most important for researchers seeking to identify breast cancer cases with the fewest possible false negatives (e.g. to select an affected cohort) .
The sensitivity and specificity of the hospital-derived flags we calculated are similar to those reported in a NSW study, which demonstrated the hospital procedures ‘lumpectomy or mastectomy’ identified invasive breast cancers in the Cancer Registry with high sensitivity (83%) and specificity (95%) . International studies have also reported high accuracy for hospital records in identifying breast cancer [26, 28, 29]. In an Italian study of hospital records, the combination of hospital diagnosis together with ‘lumpectomy or mastectomy’ accurately identified the majority of cases on the Cancer Registry (PPV 91%, sensitivity 85%, specificity 99%) .
We found that self-reported diagnosis of breast cancer correctly identified 50% of invasive breast cancer diagnoses to within 12-months of the birth year reported. While one would expect individuals to self-report diagnoses such as cancer reliably [30, 31], the baseline survey did not ask woman to differentiate between invasive breast cancer and DCIS. Women may have accurately reported a DCIS as a diagnosis of breast cancer, however our data extract from the Cancer Registry was limited to invasive tumours so this was not able to be confirmed. In addition, women may not accurately recall the age at which they were diagnosed [30–32]. In this study, women reporting a ‘diagnosis year’ overlapping the period July 2004 to December 2005 but without a Cancer Registry diagnosis during this period were considered false positives. A sensitivity analysis indicated that 399 of 581 (69%) of these ‘false positives’ (according to our definition) did have a Cancer Registry diagnosis for invasive breast cancer, but had incorrectly reported their age at diagnosis.