- Research article
- Open Access
Accuracy of reporting of Aboriginality on administrative health data collections using linked data in NSW, Australia
BMC Medical Research Methodology volume 20, Article number: 267 (2020)
Aboriginal people are under-reported on administrative health data in Australia. Various approaches have been used or proposed to improve reporting of Aboriginal people using linked records. This cross-sectional study used self-reported Aboriginality from the NSW Patient Survey Program (PSP) as a reference standard to assess the accuracy of reporting of Aboriginal people on NSW Admitted Patient (APDC) and Emergency Department Data Collections (EDDC), and compare the accuracy of selected approaches to enhance reporting Aboriginality using linked data.
Ten PSP surveys were linked to five administrative health data collections, including APDC, EDDC, perinatal, and birth and death registration records. Accuracy of reporting of Aboriginality was assessed using sensitivity, specificity, and positive and negative predictive values (PPVs and NPVs) and F score for the EDDC and APDC as baseline and four enhancement approaches using linked records: “Most recent linked record”, “Ever reported as Aboriginal”, and two approaches using a weight of evidence, “Enhanced Reporting of Aboriginality (ERA) algorithm” and “Multi-stage median (MSM)”.
There was substantial under-reporting of Aboriginality on APDC and EDDC records (sensitivities 84 and 77% respectively) with PPVs of 95% on both data collections. Overall, specificities and NPVs were above 98%. Of people who were reported as Aboriginal on the PSP, 16% were not reported as Aboriginal on any of their linked records. Record linkage approaches generally increased sensitivity, accompanied by decrease in PPV with little change in overall F score for the APDC and an increase in F score for the EDDC. The “ERA algorithm” and “MSM” approaches provided the best overall accuracy.
Weight of evidence approaches are preferred when record linkage is used to improve reporting of Aboriginality on administrative health data collections. However, as a substantial number of Aboriginal people are not reported as Aboriginal on any of their linked records, improvements in reporting are incomplete and should be taken into account when interpreting results of any analyses. Enhancement of reporting of Aboriginality using record linkage should not replace efforts to improve recording of Aboriginal people at the point of data collection and addressing barriers to self-identification for Aboriginal people.
Accurate recording of Aboriginal people on population health administrative data collections is essential to correctly measure the health gap between Aboriginal and non-Aboriginal people and to monitor and evaluate programs that aim to reduce health disparities. The Australian National Best Practice Guidelines for Collecting Indigenous Status in Health Data Sets requires patients to be asked a standard question at every health system encounter, allowing individuals to respond differently at each contact . Although the quality of health information on Aboriginal people has improved in Australia over time, administrative data collections continue to underestimate the true number of Aboriginal people that utilise health services . In 2011–12, 80% of Aboriginal patients were estimated to be correctly reported on NSW hospital records .
While efforts continue to improve reporting of Aboriginal people on administrative data collections in Australia , various approaches have been proposed or used to enhance reporting of Aboriginal people on administrative data collections using record linkage, including:
at least two hospitals: a person is recorded as Aboriginal at more than one hospital 
Enhanced reporting of Aboriginality (ERA) algorithm: if the person has 3 or more independent sources of information on the linked dataset, at least 2 must indicate that the person is Aboriginal; if the person has 1 or 2 independent sources of information, 1 is sufficient to consider the person to be Aboriginal [5, 12,13,14,15]
Multi-stage median: each person is given a derived Aboriginal status for each data collection in the linked dataset, and the collection-derived Aboriginal status is combined into an overall derived Aboriginal status for each person [7, 11, 1314]
When comparing the performance of the different approaches, most studies report the percentage change in number of records reported as Aboriginal following enhancement compared to the original reported value. This comparison does not take into account the accuracy of the enhancement. To date, no published studies have assessed the accuracy of approaches to improve reporting of Aboriginal people using an independent reference standard.
In this study, we used a New South Wales (NSW) Patient Survey Program (PSP) dataset as an external reference standard. We used the PSP as an external reference standard for reporting of Aboriginal people due to the voluntary nature of participation, the completion of the survey in a person’s own time and in the privacy of his/her own home, and advice to participants that individual responses are not accessible to health care providers.
This study uses self-reported Aboriginality from the NSW Patient Survey Program to:
assess the accuracy of reporting of Aboriginal people on NSW hospital and emergency department data collections, and
compare the accuracy of a range of approaches to enhance reporting of Aboriginal people on NSW hospital and emergency department data collections.
Cross-sectional observational study using linked population health administrative data.
Patients who were admitted to hospital or attended an emergency department in 2013–2015, completed a relevant PSP survey and gave consent for record linkage.
We use the term “Aboriginal people” to refer to both Aboriginal and Torres Strait Islander peoples for the purpose of this study.
The PSP collects information on the experiences of people who have recently had contact with the NSW public health system to facilitate performance reporting on patient satisfaction with health services. The PSP is managed by the NSW Bureau of Health Information (BHI) . Since 2013, the PSP has sought consent for the information to be used for research, including record linkage studies.
Ten PSP datasets were included in this study: Adult Admitted Patient Survey (2013, 2014 and 2015), Emergency Department Patient Survey (2013–14, 2014–15 and 2015–16), Admitted Children and the Young Patients Survey (2015), Small and Rural Hospitals Survey (2015), Small Hospital Emergency Care Survey (2015–16) and the Maternity Care Survey (2015). Of the 181,747 respondent records in the ten PSP datasets, 150,452 (83%) included consent for record linkage; there was no difference in the consent rates between Aboriginal and non-Aboriginal people. The number of records in each PSP dataset varied from 4128 to 35,962 and the proportion of respondents that consented varied from 79 to 90% (Table S1).
The ten PSP datasets were linked to the following administrative datasets: the NSW Perinatal Data Collection (PDC), NSW Admitted Patient Data Collection (APDC), NSW Emergency Department Data Collection (EDDC), Registry of Births, Deaths and Marriages birth registrations (RBDM Births) and the Cause of Death Unit Record File (CODURF).
Record linkage and dataset preparation
The 150,452 PSP records for consenting participants were linked by the NSW Centre for Health Record Linkage (CHeReL)  to records of the APDC, EDDC, RBDM Births (as mother, baby or other parent), PDC (as mother or baby) and CODURF, within 2 years of the PSP survey dates (2011 to 2016–17). Of the 150,452 linked PSP records, 2996 (2.2%) were excluded due to missing information (n = 2890) or conflicting information on Aboriginality across PSP datasets (n = 106).
The final linked dataset for analysis comprised 1,265,799 linked records relating to 130,514 persons: 75041 records for Aboriginal people (5102 Aboriginal people) and 1,190,758 records for non-Aboriginal people (125,412 non-Aboriginal people). The contribution of source dataset records to the linked dataset was: APDC, n = 714,704 (56.5%); EDDC, n = 515,907 (40.8%); RBDM birth registration records (babies and parents), n = 16,387 (1.3%); PDC records (babies and mothers), n = 14,377 (1.1%); and CODURF, n = 4424 (0.3%).
We estimated the level of reporting of Aboriginal people on the APDC and EDDC by comparing Aboriginality reported to the PSP with Aboriginality as recorded on the APDC or EDDC record that was originally sampled for the PSP. In addition to this “As-recorded” measure, we compared four approaches to enhance reporting of Aboriginality using linked records of the PDC, APDC, EDDC, RBDM Births and COD URF datasets for all events for the person:
“Most recent”: Aboriginality reported at the most recent admission/presentation
“ERA algorithm”: The ERA algorithm is a weight of evidence approach that relies on independent sources of information. Each independent report is counted as a “unit of information” that contributes to the weight of evidence as to whether a person is reported as Aboriginal:
if the person has 3 or more units of information, at least 2 indicating that the person is Aboriginal or Torres Strait Islander; or
if the person has 1 or 2 units of information, 1 is sufficient to report the person as Aboriginal or Torres Strait Islander.
“Multi-stage median” (MSM): The MSM is a weight of evidence approach that applies the ERA algorithm within each data collection and then applies the ERA algorithm a second time using the results from each data collection as the unit of information.
“Ever reported”: A single linked record from any dataset is sufficient to report a person as Aboriginal or Torres Strait Islander.
We calculated measures of validity, including sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F score for “As-recorded” Aboriginality and for each of the 4 enhancement methods using the PSP as the standard. Measures of validity were calculated for the APDC and EDDC separately. For each data source the accuracy of the “As-recorded” and enhancement algorithms are described overall and by age.
Of the 130,514 people in the PSP study population, 5102 (3.9%) reported themselves to be Aboriginal. Compared to non-Aboriginal people, Aboriginal people in the study population had a similar sex distribution, were substantially younger and more likely to live in a non-metropolitan area (Table 1).
Of the 5102 Aboriginal people reported on the PSP: 272 (5%) had 1 linked record; 448 (9%) had 2 linked records; and 4382 (86%) had 3 or more linked records. Of the 5102, 838 (16%) had no linked record that recorded the person as Aboriginal; these 838 people related to 3314 linked APDC records and 3440 linked EDDC records for Aboriginal people (8 and 11% respectively).
Of the 125,412 non-Aboriginal people reported on the PSP: 12458 (10%) had 1 linked record; 15,427 (12%) had 2 linked records; and 97,527 (78%) had 3 or more linked records. Of the 125,412 non-Aboriginal people reported on the PSP: 677 (0.5%) were recorded as Aboriginal on one or more of their linked records; 120 (0.1%) were consistently reported as Aboriginal (44 had 1 linked record; and 633 had 2 or more linked records, of which 76 were consistently reported as Aboriginal); 508 (0.05%) were considered Aboriginal by the ERA approach; and 497 (0.05%) were considered Aboriginal by MSM approach.
Using the PSP as the reference, the overall sensitivity, PPV and F score of Aboriginal people “As-recorded” on the APDC were 84, 95% and 0.90 respectively, and on the EDDC were 77, 95% and 0.85 respectively (Table 2). Specificities and NPVs for both APDC and EDDC and were above 98%. When age groups were compared, the sensitivity, PPV and F scores of Aboriginal people “As-recorded” were highest among 40–64 year olds for both the APDC and EDDC, and lowest among 20–39 year olds in the APDC and 0–19 year olds in the EDDC. Specificities and NPVs were generally over 98%.
Using the PSP as the reference and comparing to the “As-recorded” approach, the “ERA Algorithm”, “MSM” and “Ever reported” enhancement methods produced overall increases in sensitivity though at the cost of a decrease in PPV. The “most recent” method produced a lower sensitivity and PPV for the APDC, and a slightly higher sensitivity and equivalent PPV for the EDDC. Specificities and NPVs were generally over 98%. In terms of a balance between maximising sensitivity and minimising the accompanying reduction in PPV: for the APDC, F scores were similar across all enhancement methods, no enhanced method resulted in a higher F score than “As-recorded”, though the “MSM” and “ERA algorithm” methods produced an equivalent F score of 0.90; and for the EDDC, all enhanced methods produced a higher F score, with “MSM” and “ERA algorithm” methods producing the highest F score of 0.89. When age groups were examined, similar patterns were observed with the possible exception of the youngest age group (0–19 years), where the “Most recent” enhancement approach produced a very marginally better F score associated with a relatively higher PPV.
This is the first published study to quantify the level of reporting of Aboriginal people on administrative health data collections using an external reference standard and validates a range of methods using linked data to enhance reporting. Using the PSP as a reference standard, we found that the sensitivity and PPV of reporting of Aboriginal people on the APDC was 84 and 95% respectively, and on the EDDC was 77 and 95% respectively. Specificities and NPVs were generally over 98% while F scores were generally above 0.85. Importantly, we found that 16% of people who were reported as Aboriginal on the PSP were not reported as Aboriginal on any of their linked records. This is similar to the results of a recent Queensland study that examined ED presentations in a single facility .
Of the four enhancement methods examined in this study, the “ERA algorithm” and “MSM” approaches had the overall highest F scores (APDC: 0.90, EDDC: 0.89) and improved the sensitivity of reporting of Aboriginal people compared to an “As-recorded” approach at the cost of decreased PPV for both the APDC and EDDC, with overall sensitivities of 91% and PPVs of 88–89% for the APDC, and sensitivities of 88% and PPVs of 90% for the EDDC.
The “ERA algorithm” and “MSM” approaches take account of the weight of evidence that a person is Aboriginal and offset the possibility of incorrect enhancement due to administrative health records being incorrectly reported as relating to an Aboriginal person, or to incorrectly linked records. The CHeReL has a range of approaches to record linkage . The probabilistic linkage procedure used by the CHeReL for this project was designed to achieve a false positive rate of no more than 0.5% .
The “MSM” approach takes into account the possibility of systematic differences in patterns of reporting of Aboriginality in different data collections, and that data collections may vary greatly in the number of records that they hold. It is argued that if these issues are not taken into account, then an enhancement approach may ultimately only reflect Aboriginality as reported in whichever collection has the most records about a person . In this study, the “MSM” and “ERA algorithm” approaches had the same overall F scores, and similar PPVs and sensitivities for both the APDC and EDDC.
The “Ever-reported” approach had the highest sensitivity and lowest PPV of all enhancement methods. The relatively low PPV demonstrates the vulnerability of the method to incorrect reporting of non-Aboriginal people as Aboriginal. The “Most-recent” method had the lowest sensitivity of all the approaches, and in the case of the APDC, lower sensitivity than the “As-recorded” approach.
There were 677 PSP respondents who were reported as non-Aboriginal on the PSP and reported as Aboriginal on at least one linked administrative health record, with 508 of these meeting one of the weight of evidence criteria for reporting a person as Aboriginal and 120 consistently reported as Aboriginal on all their linked records. Under-reporting and inconsistencies in reporting of Aboriginal people on administrative health data collections may be due to health staff not asking patients about their Aboriginality or Aboriginal people choosing not to self-report in a particular context. Non-Aboriginal people may be incorrectly reported as Aboriginal on administrative health data collections by health staff mistakenly reporting indigenous peoples from other countries as Aboriginal.
The strengths of our study are that it is population-based, and that the PSP datasets are representative samples from relevant public hospital and emergency department datasets for each patient survey and are independent sources of information on self-reported Aboriginality. The limitations of the study are:
The PSP is not perfect reference standard. The PSP was endorsed as the reference standard by the project Aboriginal Reference Group due to the exclusive self-report approach and the safe context, that is, the voluntary nature of participation, completion of the survey in a person’s own time and in the privacy if their own home, and advice to participants that individual responses are not accessible to health care providers. Our finding that 16% (n = 838) of people who were reported as Aboriginal on the PSP were not reported as Aboriginal on any of their linked records tends to favour the use of the PSP as a reference standard. The small proportion of PSP records (2.2%) excluded due to missing information on Aboriginality or conflicting information across PSP datasets indicates that the PSP is not a perfect reference standard. The approximately 500 (0.05%) people who were reported as non-Aboriginal on the PSP and met one of the weight of evidence criteria on their linked administrative records also suggests that the PSP is not a perfect reference standard; however this finding could also represent incorrect or inconsistent reporting of Aboriginal people across the APDC and EDDC datasets, or false positive links. A less than perfect sensitivity of the PFP would impact on the results; in particular, the PPVs and F scores of the APDC and EDDC would be underestimated in this study.
Incorrect links may also contribute to inconsistent reporting across linked records. Incorrect links are more likely within families or households where names or addresses are similar. Where families and households comprise a mix of Aboriginal and non-Aboriginal people, incorrect links may result in apparent inconsistency in reporting of a person’s Aboriginality.
Data linkage was limited to a window of two years before and after the PSP in accordance with patient consent—a longer time period would have increased the number of linked records and increased the potential for further enhancement of reporting of Aboriginality.
The combined PSP surveys included in the study population represent a sampling frame for the study population that may not be representative of APDC and EDDC records generally. The sampling frame for nine of the 10 surveys were adults, with only one survey (2% of total PSP records) targeted at children and young people. Also, this study was based on a linked dataset derived from a cohort of people attending public hospitals and emergency departments; private hospital admissions account for 39% of hospital activity in NSW.
Previous comparisons of weight of evidence approaches [4, 6] have shown that using information from linked records to enhance reporting of Aboriginality reduces the number of records with missing data, improves consistency within records for individuals and increases the overall number of records classified as Aboriginal. By using the PSP as a reference, we found that where an enhancement approach increases sensitivity, that is, increases the proportion of records correctly classified as relating to Aboriginal people, PPV is decreased by increasing the proportion of records incorrectly classified as relating to Aboriginal people, with no change in the overall F score for the APDC and an increase in F score for the EDDC.
We found that 16% of people who were reported as Aboriginal on the PSP were not reported as Aboriginal on any of their linked records. These 16% of people represent 8% of linked APDC and 11% of linked EDDC records for Aboriginal people in the study. This creates an absolute limit on the potential for record linkage to enhance reporting of Aboriginal people on these datasets. Of the approaches tested, we found that the weight of evidence approaches, “ERA algorithm” and “MSM”, performed best. Inclusion of more years of data in the linkage is likely to improve the enhancement. Consideration of family linkages may improve the reporting of Aboriginal children . Inclusion of a greater range of administrative datasets in the linkage may also improve the enhancement; however it is important to bear in mind that contributing data sources must collect information on Aboriginality independently of each other in order to contribute to the weight of evidence.
Enhanced reporting of Aboriginal people using record linkage does not define a person as Aboriginal. It is a statistical construct that results in improved information about Aboriginal people for the purposes of planning and managing health services. Weight of evidence approaches are preferred when record linkage is used to improve reporting of Aboriginality on administrative health data collections. However, even the most accurate enhancement approaches substantially under-report Aboriginal people on administrative datasets and this should be taken into account in the interpretation of results of any analyses. These results highlight the importance of continued efforts to improve recording of Aboriginal people on administrative data at the point of data collection and addressing barriers to self-identification for Aboriginal people.
Availability of data and materials
The datasets generated during the study are not publicly available as they contain information that could potentially re-identify individuals; however, datasets are available from the corresponding author on reasonable request and with relevant ethical approval.
Admitted Patient Data Collection
Bureau of Health Information
Centre for Health Record Linkage
Cause of Death Unit Record File
Emergency Department Data Collection
Enhanced Reporting of Aboriginality
Negative Predictive Value
New South Wales
Perinatal Data Collection
Positive Predictive Value
Patient Survey Program
Registry of Births, Deaths and Marriages
Australian Institute of Health and Welfare. National best practice guidelines for collecting Indigenous status in health data sets. Cat. no. IHW 29. Canberra: AIHW, 2010. https://www.aihw.gov.au/reports/indigenous-australians/national-guidelines-collecting-health-data-sets accessed July 2020.
Australian Institute of Health and Welfare. The health and welfare of Australia’s Aboriginal and Torres Strait Islander peoples. Cat. no. IHW 147. Canberra: AIHW, 2015. https://www.aihw.gov.au/reports/indigenous-health-welfare/indigenous-health-welfare-2015 accessed July 2020.
Australian Institute of Health and Welfare. Indigenous identification in hospital separations data: quality report. Cat. no. IHW 90. Canberra: AIHW, 2013. https://www.aihw.gov.au/reports/indigenous-australians/indigenous-identification-in-hospital-separations accessed October 2020.
Australian Institute of Health and Welfare. Towards better Indigenous health data. Cat. no. IHW 93. Canberra: AIHW, 2013. https://www.aihw.gov.au/reports/indigenous-australians/towards-better-indigenous-health-data/ accessed July 2020.
Population and Public Health Division. Improved reporting of Aboriginal and Torres Strait Islander peoples on population datasets in New South Wales using record linkage–a feasibility study. Sydney: NSW Ministry of Health, 2012. https://www.health.nsw.gov.au/hsnsw/Pages/atsi-data-linkage-report.aspx accessed July 2020.
Australian Institute of Health and Welfare. An enhanced mortality database for estimating Indigenous life expectancy: A feasibility study. Cat. no. IHW 75. Canberra: AIHW, 2012. https://www.aihw.gov.au/reports/indigenous-australians/mortality-database-indigenous-life-expectancy accessed July 2020.
Christensen D, Davis G, Draper G, Mitrou F, Mckeown S, Lawrence D, McAullay D, Pearson G, Rikkers W, Zubrick S. Evidence for the use of an algorithm in resolving inconsistent and missing indigenous status in administrative data collections. AJSI. 2014;49:423–43. https://doi.org/10.1002/j.1839-4655.2014.tb00322.x.
Badgery-Parker T, 2012. Majority rule for assigning aboriginality in linked hospital data. Aust N Z J Public Health, 2012;36:488–489. DOI: https://doi.org/10.1111/j.1753-6405.2012.00918.x.
Briffa TG, Sanfilippo FM, Hobbs MS, Ridout SC, Katzenellenbogen JM, Thompson PL, Thompson SC. Under-ascertainment of aboriginality in records of cardiovascular disease in hospital morbidity and mortality data in Western Australia: a record linkage study. BMC Med Res Methodol. 2010;10:111. https://doi.org/10.1186/1471-2288-10-111.
Gialamas A, Pilkington R, Berry J, Scalzi D, Gibson O, Brown A, Lynch J. Identification of Aboriginal children using linked administrative data: consequences for measuring inequalities. J Paediatr Child Health. 2016;52:534–40. https://doi.org/10.1111/jpc.13132.
Gibberd AJ, Simpson JM, Eades SJ. Use of family relationships improved consistency of identification of Aboriginal people in linked administrative data. J Clin Epidemiol. 2017;90:144–55. https://doi.org/10.1016/j.jclinepi.2017.06.021.
Randall DA, Lujic S, Leyland AH, Jorm LR. Statistical methods to enhance reporting of Aboriginal Australians in routine hospital records using data linkage affect estimates of health disparities. Aust N Z J Public Health. 2013;37:442–9. https://doi.org/10.1111/1753-6405.12114.
Tervonen HE, Purdie S, Creighton N. Using data linkage to enhance the reporting of cancer outcomes of Aboriginal and Torres Strait islander people in NSW, Australia. BMC Med Res Methodol. 2019;19:245. https://doi.org/10.1186/s12874-019-0884-8.
McNamara BJ, Jones J, Shepherd CCJ, Gubhaju L, McAullay D, Preen DB, Eades SJ, Jorm L. Identifying young Aboriginal and Torres Strait Islander children in linked administrative data: A comparison of methods. Int J Popul Data Sci. 2020;5(1):11. https://doi.org/10.23889/ijpds.v5i1.1100.
Taylor LK, Bentley J, Hunt J, Madden R, McKeown S, Brandt P, Baker D. Enhanced reporting of deaths among Aboriginal and Torres Strait islander peoples using linked administrative health datasets. BMC Med Res Methodol. 2012;12:91. https://doi.org/10.1186/1471-2288-12-91.
Bureau of Health Information. http://www.bhi.nsw.gov.au. Accessed 05 February 2020.
O’Loughlin M, Harriss L, Mills J, Thompson F, McDermott R. Validating Indigenous status in a regional Queensland hospital emergency department dataset with patient linked data. Med J Aust. 2020;212(5):230–1. https://doi.org/10.5694/mja2.50401.
Irvine K, Hall R, Taylor L. A profile of the Centre for Health Record Linkage. Int J Popul Data Sci. 2019;4(2):07. https://doi.org/10.23889/ijpds.v4i2.1142.
Centre for Health Record Linkage. http://www.cherel.org.au/quality-assurance. Accessed 05 February 2020.
This project was supported by an Aboriginal reference group, comprising Aboriginal members from Aboriginal Controlled Health organisations and NSW Local Health Districts. Members included Tim Croft, Kristy Glanville, Adam Stuart and Aimee Smith.
We would like to acknowledge and thank the NSW Ministry of Health and Bureau of Health information for granting access to the population health data and the NSW Centre for Health Record Linkage for linking the datasets. We would like to acknowledge Angus Liu for input into an early draft of the manuscript.
Ethics approval and consent to participate
Ethical approval was obtained from the NSW Population and Health Services Research Ethics Committee (2016/HRE1209) and the Aboriginal Health and Medical Research Council (AH&MRC) Ethics Committee (1297/17).
The PSP seeks consent for the information to be used for research, including record linkage studies.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Nelson, M.A., Lim, K., Boyd, J. et al. Accuracy of reporting of Aboriginality on administrative health data collections using linked data in NSW, Australia. BMC Med Res Methodol 20, 267 (2020). https://doi.org/10.1186/s12874-020-01152-2
- Aboriginal health
- Indigenous health
- Administrative data
- Linked data
- Data linkage
- Record linkage