Using data linkage to enhance the reporting of cancer outcomes of Aboriginal and Torres Strait Islander people in NSW, Australia

Background Aboriginal people are known to be under-recorded in routinely collected datasets in Australia. This study examined methods for enhancing the reporting of cancer incidence among Aboriginal people using linked data methodologies. Methods Invasive cancers diagnosed in New South Wales (NSW), Australia, in 2010–2014 were identified from the NSW Cancer Registry (NSWCR). The NSWCR data were linked to the NSW Admitted Patient Data Collection, the NSW Emergency Department Data Collection and the Australian Coordinating Register Cause of Death Unit Record File. The following methods for enhancing the identification of Aboriginal people were used: ‘ever-reported’, ‘reported on most recent record’, ‘weight of evidence’ and ‘multi-stage median’. The impact of these methods on the number of cancer cases and age-standardised cancer incidence rates (ASR) among Aboriginal people was explored. Results Of the 204,948 cases of invasive cancer, 2703 (1.3%) were recorded as Aboriginal on the NSWCR. This increased with enhancement methods to 4184 (2.0%, ‘ever’), 3257 (1.6%, ‘most recent’), 3580 (1.7%, ‘weight of evidence’) and 3583 (1.7%, ‘multi-stage median’). Enhancement was generally greater in relative terms for males, people aged 25–34 years, people with cancers of localised or unknown degree of spread, people living in urban areas and areas with less socio-economic disadvantage. All enhancement methods increased ASRs for Aboriginal people. The weight of evidence method increased the overall ASR by 42% for males (894.1 per 100,000, 95% CI 844.5–945.4) and 27% for females (642.7 per 100,000, 95% CI 607.9–678.7). Greatest relative increases were observed for melanoma and prostate cancer incidence (126 and 63%, respectively). ASRs for prostate and breast cancer increased from below to above the ASRs of non-Aboriginal people with enhancement of Aboriginal status. Conclusions All data linkage methods increased the number of cancer cases and ASRs for Aboriginal people. Enhancement varied by demographic and cancer characteristics. We considered the weight of evidence method to be most suitable for population-level reporting of cancer incidence among Aboriginal people. The impact of enhancement on disparities in cancer outcomes between Aboriginal and non-Aboriginal people should be further examined.


Background
Aboriginal people are known to be under-recorded in routinely collected datasets [1][2][3]. Reasons for underrecording are complex and include a lack of awareness and training to ask about Aboriginal status among health staff, and among Aboriginal people concerns about how the question was asked, racism and discrimination, privacy, a lack of cultural safety and difficulties in tracing identity [4]. Under-recording of Aboriginal status generally results in under-estimation of absolute measures of health indicators [5,6].
It is possible to enhance reporting of health outcomes of Aboriginal people by linking data from several sources [7]. For example, Randall and colleagues showed that different enhancement methods using linked data increased the number of hospital admissions for Aboriginal people with varying impacts on admission and mortality ratios [6]. Several different methods for enhancing identification of Aboriginal people have been used, with no consensus on the optimal method. Australian guidelines on data linkage related to Aboriginal people recommend comparing the impact of several methods and choosing the optimal method based on the purpose of the analysis and characteristics of the datasets [7].
Aboriginal people are under-recorded in the New South Wales Cancer Registry (NSWCR) despite increased recording of Aboriginal status over time [3]. In the early 1980s, more than 80% of people on the NSWCR had unknown Aboriginal status, which had dropped to approximately 13% by 1999. A previous study examining the feasibility of enhancement of reporting of Aboriginal people using linked data from several data sources, including NSWCR, found that the number of cancer cases, and hence cancer incidence, for Aboriginal people increased following enhancement [2].
Estimates of health outcomes among Aboriginal people and the size of disparities compared with non-Aboriginal people can change depending on how Aboriginal status is reported and which enhancement method is used [5,6]. Accurate and complete recording of Indigenous status is needed to reliably measure cancer outcomes, identify disparities and produce information about cancer among Indigenous people globally. Cancer registries are a key source of information for reporting cancer outcomes yet there are very few studies examining the impact of underrecording of Indigenous status on cancer incidence [8]. This study examined the impact of linked data enhancement methods on the number of cancer cases and cancer incidence rates among Aboriginal people in NSW, Australia, using common algorithms and population-based datasets.

Study design and data sources
This was a retrospective cohort study using linked routinely-collected health data. All cases of invasive cancer diagnosed and recorded in the NSWCR between 2010 and 2014 were included in the analyses. The NSWCR is a statutory population-based cancer registry which collects information about all invasive cancers diagnosed in NSW, Australia. Information about Aboriginal and Torres Strait Islander status in the NSWCR comes from multiple sources, such as hospital treatment episodes and death registration [3]. Pathology reports do not include information about Aboriginal and Torres Strait Islander status and, therefore, this information is missing if the NSWCR only receives a pathology notification. The NSWCR uses a progressive positive identification algorithm with a single notice from any source indicating a person to be Aboriginal or Torres Strait Islander taking precedence over any other information. Aboriginal and Torres Strait Islander status is assigned at a person level, rather than individual cancer case level. Torres Strait Islander people are included with Aboriginal people throughout this study due to the small number of people from the Torres Strait Islands residing in NSW and in recognition that Aboriginal people are the original inhabitants of NSW [4].
The NSWCR data were linked to the NSW Admitted Patient Data Collection (APDC), the NSW Emergency Department Data Collection (EDDC) and the Australian Coordinating Registry Cause of Death Unit Record File (COD URF). The APDC includes records of all hospital admissions in NSW public and private hospitals and day procedure centres, the EDDC includes information on presentations to emergency departments of public hospitals in NSW, and the COD URF includes information about deaths occurring in NSW. Data linkage was performed by the Centre for Health Record Linkage (CHeReL). The CHeReL uses Choicemaker software to perform probabilistic linkage of personal identifiers using a privacy-preserving protocol (http://www.cherel.org.au). The datasets used in this study are in the CHeReL's Master Linkage Key. The CHeReL implements quality assurance procedures and performs clerical review of a sample of records to keep the estimated false positive and false negative linkage rate to less than 5 per 1000. The CHeReL provided a unique and arbitrary "Project Person Number" which enabled the records in each study dataset to be joined for an individual without the researchers accessing personal identifiers.
The APDC data covered a period between July 2001 and December 2017, the EDDC between January 2005 and December 2017, and the COD URF between January 1985 and December 2015. Aboriginal status is self-reported in the APDC and EDDC and is provided by the next-of-kin in the COD URF. Population data were based on data from the Australian Bureau of Statistics and obtained through the Secure Analytics for Population Health Research and Intelligence (SAPHaRI) data warehouse (Centre for Epidemiology and Evidence, NSW Ministry of Health).

Enhancement methods
The following methods for enhancing the reporting of cancer among Aboriginal people were used: 'ever reported as Aboriginal' [7], 'Aboriginal on most recent record' [7], 'weight of evidence' [2] and 'multi-stage median' [9] (Table 1). These methods were selected because they are among the most commonly used methods, represent a combination of simple and complex enhancement methods and are likely to provide a range of estimates. If a person was recorded as Aboriginal on the NSWCR or on the COD URF, a person was considered to be Aboriginal in the analyses. Our aim was to correct for underrecording of Aboriginal people in the NSWCR, so we only considered changing the status of those recorded as non-Aboriginal or with unknown status in the NSWCR. We considered the risk of a person being wrongly identified as Aboriginal in the COD URF to be low since the information is provided by the next-of-kin. Otherwise the four enhancement methods were applied to the data according to the descriptions provided in Table 1.

Statistical analysis
The number, proportion and characteristics of cases reported as Aboriginal using the NSWCR information and the four enhancement methods were compared. Characteristics considered in this study were: sex, age at diagnosis, year of diagnosis, cancer site, degree of spread (localised, regional, distant, unknown) [10], residential remoteness (major cities, inner regional, outer regional, remote/very remote) [11], and area-based socio-economic disadvantage (Index of Relative Socio-economic Disadvantage quintiles) [12]. For descriptive analyses, cancer sites were classified using clinical cancer grouping [13].
Age-standardised cancer incidence rates (ASR) were calculated for non-Aboriginal and Aboriginal people using the NSWCR Aboriginal status variable before enhancement. Cases with unknown Aboriginal status were considered non-Aboriginal. For Aboriginal people, cancer incidence was also calculated using the variables created by the four enhancement methods. Direct age-standardisation was calculated using the 2001 Australian standard population and NSW population data based on data from the Australian Bureau of Statistics [14]. Results were reported as rates per 100,000 with 95% confidence intervals (CIs) for all cancers and for the following sites: (female) breast (International Statistical Classification of Diseases and Related Health Problems, 10th Revision, Australian Modification code C50), colorectal (C18-C20), prostate (C61), lung (C34), melanoma (C43), and cervical cancer (C53).
The impact of different enhancement methods on the number of cases and on ASRs was examined in relative terms (% increase compared with the NSWCR variable). Analyses were performed using SAS Version 9.4 (SAS Institute, Cary, NC).

Results
Overall 204,948 cases of invasive cancer were diagnosed in NSW in 2010-2014. Of these, 2703 (1.3%) were diagnosed among Aboriginal people based on the NSWCR Aboriginal status variable. There were 28,572 cases of cancer with unknown Aboriginal status (13.9%). After enhancement, the number of cases among Aboriginal people increased to 4184 (2.0%, 'ever'), 3257 (1.6%, 'most recent'), 3580 (1.7%, 'weight of evidence') and 3583 (1.7%, 'multi-stage median'). The majority of cancer cases with a status change after enhancement were originally recorded as non-Aboriginal, rather than unknown Aboriginal status. For example, of the 877 cases of cancer with a status enhanced to Aboriginal using the weight of evidence method, 74% (n = 651) were recorded as non-Aboriginal and 26% (n = 226) had unknown Aboriginal status on the NSWCR.
Relative enhancement (per cent increase) was generally greater for males, people aged 25-34 years, people with cancers of unknown or localised degree of spread, people living in urban areas and areas with less socioeconomic disadvantage ( Table 2).
Overall the ASR among Aboriginal people was 559.9 per 100,000 (95% CI 535.3-585.3) before enhancement. All enhancement methods increased ASRs overall and for both males and females (Table 3, Fig. 1). The greatest Table 1 The enhancement methods used in the analyses

Method Description
Ever reported [7] Recorded as being Aboriginal at least once in any of the data sources.
Most recent record [7] Recorded as being Aboriginal in the most recent record in any of the data sources.
Weight of evidence [2] Recorded as Aboriginal if 1) there are three or more units of information and at least two indicate that the person is Aboriginal; 2) if there are one or 2 units of information and at least one identifies the person as Aboriginal.
Multi-stage median [9] The weight of evidence method is applied in a twostep process: firstly to each dataset individually; and then treating the results for each dataset as units of information. increases were detected when using the 'ever reported' and the smallest increases when using the 'most recent' method. Enhancement increased incidence rates more for males than females. For example, the 'weight of evidence' method increased the ASR by 42% for males (894.1 per 100,000, 95% CI 844.5-945.4) and 27% for females (642.7 per 100,000, 95% CI 607.9-678.7).
In site-specific analyses, all enhancement methods increased ASRs for all sites compared with rates estimated using the NSWCR Aboriginal status variable (Table 3, Fig. 2). Again, the 'ever reported' method demonstrated the greatest increases while the 'most recent' method resulted in the smallest increases. Greatest relative increases were observed for melanoma and prostate cancer incidence, with increases of 126 and 63% respectively, using the 'weight of evidence' method.

Discussion
All enhancement methods increased both the number of cancer cases and age-standardised cancer incidence rates among Aboriginal people. The 'ever reported' method demonstrated the greatest increases and 'most recent' method the smallest increases, while the other two methods were very similar to each other and between these two extremities. When using the 'weight of evidence' method, the majority (74%) of cases with enhanced Aboriginal status were previously recorded as non-Aboriginal on the NSWCR. This indicates misclassification in the NSWCR Aboriginal status variable and highlights the need to correct this misclassification and not solely focus on decreasing the number of people with unknown Aboriginal status in the NSWCR and in the information received by the NSWCR from notifiers. Aboriginal and Torres Strait Islander status is self-reported at NSW health facilities and people may choose not to identify [4]. There have been strengthened procedures at a state level to improve the collection of Aboriginal and Torres Strait Islander status in NSW health facilities [15] as well as local initiatives to provide culturally safe health care throughout the study period. These factors are likely to have increased the willingness of people to selfidentify as Aboriginal or Torres Strait Islander and improved identification at the point of care in more recent years. Linked data enhances the reporting of Aboriginal status because it brings together information on Aboriginal status that is not available to the NSWCR through people choosing to identify as Aboriginal after diagnosis or at facilities that have not provided cancer care.
Enhancement was generally greater in relative terms for males, people aged 25-34 years at diagnosis, people living in urban and less disadvantaged areas and for people with a cancer of localised or unknown degree of spread. Several factors are likely to explain these patterns, such as sources of cancer notifications and treatment patterns (e.g. the likelihood of admission for surgery). People diagnosed with cancers with good prognosis are less likely to be hospitalised or die which decreases the likelihood of recording the Aboriginal status on the NSWCR. If the NSWCR only receives pathology notification, Aboriginal status information will be missing. This is more likely to apply to cancers such as melanomas and prostate cancers, both of which showed greater levels of enhancement.
A previous NSW study reported that enhancing Aboriginal status for reporting deaths resulted in greater enhancements for older people, for females, for people living in urban areas and for those with chronic health conditions [16]. Another NSW study examining the impact on enhancement on hospital admissions reported  greater enhancement for earlier years of admission, major cities, private hospitals and varying impact by age depending on the enhancement method used [6]. Different factors impact on enhancement depending on the health outcome of interest and the datasets used in analyses.
Lung and cervical cancers saw the smallest increases in incidence rates. Both these cancers have a greater burden in Aboriginal compared with non-Aboriginal people [17]. Due to the poor prognosis, death certificate information is available for most people diagnosed with lung cancer, increasing the likelihood of Aboriginal status recording. It is likely that enhancement had a smaller impact on lung cancer incidence rates because the existing NSWCR Aboriginal status already had relatively good capture. The relatively smaller increase in the incidence of cervical cancer may due to relatively good capture on the NSWCR, but may also be due to other factors such the patterns of hospitalisation and capture of Aboriginal status at the point of care for what is generally a younger cohort of women.
Enhancing the reporting of cancer outcomes of Aboriginal people might have a major impact on observed disparities between Aboriginal and non-Aboriginal people. For example, according to national statistics [17] and our  Table 3 for underlying data and 95% confidence intervals)  Table 3 for underlying data and 95% confidence intervals) analyses using the NSWCR Aboriginal status variable, Aboriginal people have lower breast and prostate cancer incidence rates compared with non-Aboriginal people. This pattern has also been reported among Indigenous peoples in many international jurisdictions and has been proposed as being related to the prevalence of risk factors for these cancers and competing causes of death [18]. After enhancement our results indicated higher breast and prostate cancer incidence among Aboriginal people than non-Aboriginal people in NSW. This finding has implications on widely held views on risk of these cancers among Indigenous peoples. Higher breast cancer rates have been reported among Indigenous people (Māori) in New Zealand using the national population-based cancer registry which includes links to a national health database to improve identification [18]. Increased breast cancer incidence among Indigenous people have been reported in two United States (US) states using data linkage between cancer registries and health service data [19,20]. Our results also highlight the burden of melanoma among Aboriginal people which warrants further discussion on prevention strategies and actions. After enhancement our results indicated substantially higher incidence than when using the NSWCR Aboriginal status variable, but still lower rates compared with non-Aboriginal people (except when using the 'ever reported' method). The effect of under-recording of Indigenous status should be investigated in more jurisdictions. Cancer is the second leading cause of death and among the leading causes of burden of disease among Aboriginal people in Australia [21]. The findings of our study highlight the impact of cancer on Aboriginal people and the need for cancer control to improve health outcomes. Cancer control programs should have a special focus on Aboriginal people considering that their cancer burden may be higher than expected. Australian cancer screening programs are already targeting Aboriginal people due to lower participation rates [17].
Future research should also examine the impact of enhancement on other cancer outcomes, such as mortality, survival and the likelihood of being diagnosed with advanced stage disease. Studies have shown that Aboriginal people are more likely to be diagnosed with advanced stage cancer than non-Aboriginal people [22,23]. We found greatest enhancement for people diagnosed with localised or unknown degree of spread, which may impact on the likelihood of Aboriginal people being diagnosed with advanced cancer in comparison with non-Aboriginal people and affect estimates of disparities in survival outcomes since localised cancers have much better prognosis.
Based on these results and consultation with the Cancer Institute NSW Aboriginal Advisory Group, the 'weight of evidence' method was considered to be the most suitable for further reporting of cancer outcomes for Aboriginal people. The 'weight of evidence' method utilises information from several sources but is still relatively straightforward to use and report. It provides a balance between enhancing the identification of Aboriginal people and reducing misclassification of non-Aboriginal people as Aboriginal. This method was developed and is also used by the NSW Ministry of Health [6]. Studies have pointed out that 'ever reported' may result in misclassification and over-reporting [1,6]. It should be noted that an enhanced Aboriginal identifier is a statistical construct that enables improved reporting of cancer outcomes using historical data but potentially includes some inaccuracies due to errors in the source datasets and incorrect linkages [2]. Collection of accurate information at the point of care remains vital.
Limitations include that if a person was recorded as Aboriginal on the NSWCR or death certificate, this information was accepted. Although there is a possibility for positive misclassification this is likely to be low since the information is provided by the next-of-kin. Numerator-denominator bias is a known issue affecting observed cancer burden in Indigenous populations internationally because incidence and population data are derived using different data collection methodologies [8]. Population denominators can be unreliable due to under-participation of Aboriginal people and varying propensity to identify as Aboriginal in censuses. The Australian Bureau of Statistics (ABS) estimates Aboriginal and Torres Strait Islander populations using self-reported information in the Australian Census data with adjustment for undercount using a household survey following the census [14]. An increase in the number of people selfidentifying as Aboriginal or Torres Strait Islander has been observed, with people who did not self-identify in the 2011 Australian Census choosing to identify in the subsequent 2016 Census [24]. In our study, enhancement of the numerator is likely to reduce the under-estimation of cancer incidence that is common in cancer incidence estimates for Indigenous people [8]. However, without enhancement of the denominator using the same methodologies it may lead to over-estimation of incidence rates. Linkage of the cancer registry, census, hospital and mortality data would enable cancer outcomes for Aboriginal people to be estimated with reduced numerator-denominator bias.

Conclusions
All data linkage enhancement methods increased the number of cancer cases and cancer incidence rates for Aboriginal people. Enhancement varied by demographic and cancer characteristics. We considered the 'weight of evidence' method to be most suitable for future analyses of cancer outcomes of Aboriginal people. Enhancing the reporting of cancer outcomes of Aboriginal people can have major impacts on cancer disparities between Aboriginal and non-Aboriginal people and this should be further examined.