Concordance between administrative health data and medical records for diabetes status in coronary heart disease patients: a retrospective linked data study

Background Administrative data are a valuable source of estimates of diabetes prevalence for groups such as coronary heart disease (CHD) patients. The primary aim of this study was to measure concordance between medical records and linked administrative health data for recording diabetes in CHD patients, and to assess temporal differences in concordance. Secondary aims were to determine the optimal lookback period for identifying diabetes in this patient group, whether concordance differed for Indigenous people, and to identify predictors of false positives and negatives in administrative data. Methods A population representative sample of 3943 CHD patients hospitalized in Western Australia in 1998 and 2002–04 were selected, and designated according to the International Classification of Diseases (ICD) version in use at the time (ICD-9 and ICD-10 respectively). Crude prevalence and concordance were compared for the two samples. Concordance measures were estimated from administrative data comparing diabetes status recorded on the selected CHD admission (‘index admission’) and on any hospitalization in the previous 1, 2, 5, 10 or 15 years, against hospital medical records. Potential modifiers of agreement were determined using chi-square tests and multivariable logistic regression models. Results Identification of diabetes on the index CHD admission was underestimated more in the ICD-10 than ICD-9 sample (sensitivity 81.5% versus 91.1%, underestimation 15.1% versus 4.4% respectively). Sensitivity increased to 89.6% in the ICD-10 period using at least 10 years of hospitalization history. Sensitivity was higher and specificity lower in Indigenous patients, and followed a similar pattern of improving concordance with increasing lookback period. Characteristics associated with false negatives for diabetes on the index CHD hospital admission were elective admission, in-hospital death, principal diagnosis, and in the ICD-10 period only, fewer recorded comorbidities. Conclusions The accuracy of identifying diabetes status in CHD patients is improved in linked administrative health data by using at least 10 years of hospitalization history. Use of this method would reduce bias when measuring temporal trends in diabetes prevalence in this patient group. Concordance measures are as reliable in Indigenous as non-Indigenous patients.


Background
Linked administrative health data provide a unique resource for investigating whole-population diabetes mellitus ('diabetes') prevalence in different patient groups. Administrative data systems are commonly designed to collect resource utilisation data rather than as repositories for research purposes, with recording of comorbid conditions often not required at every hospital admission [1]. An understanding of the reliability of hospital data can assist in accurately estimating the impact of diabetes in the highrisk coronary heart disease (CHD) patient population [2,3]. In unlinked administrative databases where comorbidity information is obtained from a single admission, it is important to understand the reliability of coding and which patient groups may be under or overestimated from this data source. Use of information from a single admission only could underestimate diabetes prevalence and inaccurately identify diabetic patients. Applying a lookback period to identify prior admissions in which diabetes was recorded can increase detection of diabetes status in linked datasets [4]. Many studies identify diabetes using a lookback of less than two years [5,6] but information is limited on the optimal length of hospitalization history required and whether this method is consistent over time.
Changes in International Classification of Disease (ICD) versions could potentially impact the recording of conditions such as diabetes. Significant changes in coding directives for diabetes were introduced in Australia in 2000 [7], with a subsequent 20% increase in the number of diabetesrelated admissions to 2003-04 [8]. Accordingly, the national health statistics body in Australia does not measure trends spanning the ICD-9 and ICD-10 periods for overall diabetes-related hospitalizations [8]. Whether these coding changes have impacted in the same manner on CHD hospitalizations is unknown.
There is limited available data on the accuracy of recording of diabetes in population sub-groups, including in Indigenous Australians. The risk of diabetes in Indigenous people is known to be many times higher than in the general population [9], and because a high proportion of Indigenous people reside in rural and remote areas, they are more likely to be admitted to a non-metropolitan hospital. These factors could impact on identification of diabetes from administrative data for these patients. With high and increasing incidence of diabetes in this group [9,10] it is imperative that the utility of administrative data for identifying diabetes status is investigated.
The primary aim of this study was to measure the concordance of administrative data and medical records for the recording of diabetes in a sample of CHD patients, and determine whether this has changed over time. Secondary aims were to determine the optimal lookback period for identifying diabetes in this patient group, whether concordance differed for Indigenous people with CHD, and to identify predictors of false negatives and false positives in administrative data.

Study setting
The current study was performed in the state of Western Australia (WA) which is representative of the major sociodemographic and health economic indicators for Australia [11]. The population of WA in 2004, the latter period of the study, was 1.99 million, with 75% residing in the capital city, Perth [12]. Indigenous people comprise 3.5% of the WA population [13], with around 65% living in regional, rural or remote areas [14]. Data was sourced from the population-based electronic linked health database (WA Data Linkage System) which is managed by the Department of Health WA and has been used extensively for health-related research [15]. The current study used two of the system's core databases -the Hospital Morbidity Data Collection (HMDC) and the Mortality register. Statutory requirements mean that all hospitalizations and deaths in WA are recorded within these collections. The datasets are linked by probabilistic matching based on name, date of birth, gender and address, with manual clerical checking of uncertain links, and are regularly audited for quality [16]. Hospital discharge diagnoses are coded in the HMDC by trained coders using the prevailing ICD version and relevant modifications (ICD-9 from 1978, and ICD-10 from July 1, 1999).

Study sample
The study sample was selected from two existing projects: Monitoring CHD in the Modern Era (Study 1), and More Informed Action to Improve Aboriginal Heart Health in WA (Study 2). The sampling frames for these studies have been described elsewhere [17]. A stratified sample of patients aged 35-79 years with a hospital discharge diagnosis of any cardiac condition or chest pain in 1998 or 2003, admitted to a major public or private metropolitan hospital, was identified from a linked dataset containing all cardiovascular (CVD) morbidity and mortality records. The second study similarly identified all Indigenous patients and a sample of non-Indigenous patients, aged 25-79 years, admitted to any metropolitan or rural hospitals in 2002-04. Hospital record review for these samples was undertaken and information collected and stored in a medical records database. Because of the overlap in time period between the 2003 sample in Study 1 and the Study 2 sample (2002-04), 134 patients appeared in both sampling frames. These were included only once in the medical records database.
Patients in the medical records database were included in the current study if they had a principal discharge diagnosis of CHD recorded in the HMDC (ICD-9-CM 410-414, ICD-10-AM I20-I25), because of the high recording accuracy of CHD in the principal compared with secondary diagnosis fields [18]. The first CHD admission for each patient in each time period was defined as the 'index admission' and selected for inclusion in the study. The administrative data for the CHD patients were linked to the medical records database via a unique identification number assigned to every hospital admission in WA. Because of known underestimation of identification of Indigenous status in hospital discharge data [19], a patient was included as Indigenous if 25% or more of all of their HMDC records since 1980 were recorded as Indigenous.

Medical record review
Trained research assistants collected data from medical records. Thirty-nine admissions could not be reviewed due to missing medical notes. Data were obtained from admission notes from the emergency department and inpatient medical records, and each comorbidity documented as present, absent, or not recorded. Treatment of diabetes with insulin or oral hypoglycaemic drugs was identified from inpatient and discharge drug records for the admission under investigation. Patients were classified as having diabetes if it was documented as 'present' in the medical notes or if drug treatment for diabetes was identified. Patients with 'not recorded' as their diabetes status and no diabetic drugs recorded (n = 66) were classified in the no diabetes group, and a sensitivity analysis with these patients removed showed minimal difference in all concordance measures across the two samples.
Data quality was initially assessed by review of three medical records by all research assistants within two weeks of commencing data collection. Medical records of a total of 11 patients were subsequently assessed by all data collectors in each study. The observed agreement between data collectors in Study 1 was high for selected medical history (92%) and drugs (100%) and similarly for Study 2 (93% and 87% respectively).
Identification of diabetes status from hospital discharge data ICD-9-CM was in use in WA at the time of the 1998 admissions, and ICD-10-AM at the time of the 2002-04 sample. Diabetes (Type 1, Type 2, other specified or unspecified diabetes mellitus) was identified in hospital discharge data for the CHD sample if coded in any of 21 diagnosis fields (ICD-9/ICD-9-CM 250, ICD-10-AM E10-E14), using a range of lookback periods for each individual patientindex CHD admission only, and 1, 2, 5, 10 and 15 years prior to the CHD admission.
Approval for this study was obtained from the Ethics Committees of The University of Western Australia and the Department of Health WA, and from the Western Australian Aboriginal Health Ethics Committee.

Statistical analysis
The crude prevalence of diabetes in the CHD patient sample was calculated for the index admission and each lookback period using the administrative data, and from the medical records database. Observed agreement, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and Kappa were used as measures of concordance between administrative and medical records data, with the latter designated the reference standard. Concordance measures were calculated for each lookback period and also for sub-group and supplementary analyses. The percentage of under or overestimation of diabetes recording in hospital discharge data was calculated by (Sensitivity/PPV -1) ×100 [17]. All analyses were stratified by period, with the 1998 sample corresponding to ICD-9, and the 2002-04 sample to ICD-10. Differences in prevalence between medical records and administrative data were tested using McNemar's test, and concordance measures between the two ICD samples were tested using a Pearson chi-square or z-test (for under/overestimation).
Because of the possible impact of sociodemographic and clinical factors on concordance, and the potential for changing criteria for admission to hospital for CHD, variables derived from the administrative data on the index CHD admission were examined for their association with false positives and false negatives, separately for the ICD-9 and ICD-10 samples. Univariable associations were analysed using the Pearson chi-square test (or Fisher's exact test where cell counts were small). The variables tested were length of stay (1-2, 3-5, ≥6 days), age group (25-50, 51-65, 66-79 years), gender, admission type (elective versus emergency), Indigenous status, Charlson Comorbidity Index (excluding diabetes; 0, 1-4, ≥5), number of comorbidities on the index admission (excluding diabetes; 0-3, 4-7, ≥8), hospitalization in the previous 90 days, hospital location (metropolitan versus rural), hospital type (public versus private), transfer in or out on index CHD admission, in-hospital death, principal diagnosis (myocardial infarction, unstable angina, other CHD), and first-ever versus recurrent CHD admission. First-ever CHD admission was identified where there was no CHD admission in the previous 15 years. Variables were tested using binary values unless otherwise indicated and all analyses undertaken separately for each ICD period. Significant univariable variables in each period were entered into a multivariable logistic regression model and odds ratios with 95% confidence intervals (CI) were calculated. All data analyses were undertaken using SAS (version 9.3, Cary, NC, USA) and statistical significance for all analyses was set at p < 0.05.

Results
The final study sample comprised 3943 index CHD admissions, with 24 patients having an admission in both periods. Table 1 shows the clinical and demographic characteristics of the sample. There were 1685 patients in the ICD-9 sample, and 2258 in the ICD-10 sample, with Indigenous patients comprising 23.2% of the latter sample. The majority of cases (94.1%) were admitted for acute coronary syndromes (myocardial infarction or unstable angina).

Prevalence of diabetes in CHD patients
In the ICD-9 sample, there was a small but significant difference between diabetes prevalence from medical records (22.5%) and index administrative data (21.5%, p < 0.0001) ( Table 2). The difference was also significant (p < 0.0001) in the ICD-10 sample (34.9% from medical records compared with 29.7% from the administrative data), but prevalence increased to 34.2% using all previous hospital admissions to 15 years. There was a similar pattern in the Indigenous sample, although absolute prevalence levels were higher in this patient group.

Concordance measures
Observed agreement was high and Kappa very good in both samples and across all lookback periods for the recording of diabetes in the two data sources (Table 3). Sensitivity was significantly lower in the ICD-10 compared with ICD-9 sample from the index admission (81.5% versus 91.1%, p < 0.0001), but improved to 89.6% using 10 years of hospitalization history. NPV was significantly higher in the ICD-9 than ICD-10 sample (p < 0.0001 for all lookback periods), although the difference diminished with increasing lookback. Specificity was high in both samples, with a small decrease as lookback period increased. PPV declined with use of an increased lookback period, but there was no statistical difference between the two time periods. Diabetes status was underestimated by 15% from the hospital discharge index admission in the ICD-10 sample, which reduced to 2.2% using a 10-year lookback period ( Table 3).
Because of the oversampling of Indigenous patients in Study 2, the ICD-10 sample was stratified to compare concordance for the ICD-10 patients from Study 1 only with the total ICD-10 sample (Study 1 plus Study 2) (see Additional file 1). There was little difference between the restricted and full sample for all concordance measures, with a similar pattern of increasing sensitivity and a small drop in PPV with increasing lookback period.

Concordance measures in Indigenous patients
Sensitivity was higher in Indigenous compared with non-Indigenous people for every lookback period, although the difference was only significant for lookback periods of two years or more (Table 4). Maximal sensitivity was achieved with 10 years lookback in Indigenous patients (93.6%). Specificity was lower in Indigenous than non-Indigenous patients, with an increasing differential with increasing lookback period (p < 0.05). Diabetes status was underestimated from the hospital discharge index admission in Indigenous people by 13.3% reducing to 0.3% using a 10-year lookback period.

False negatives and false positives
Significant univariable predictors in both periods for false negatives were elective admission, in-hospital death, and non-acute CHD principal diagnosis (Table 5). These remained significant after multivariable adjustment in both periods. A lower number of comorbidites (0-3) were associated with higher odds of a false negative in the ICD-10 but not the ICD-9 period. The level of false positives was low in both the ICD-9 (n = 22, 1.3%) and ICD-10 (n = 40, 1.8%) samples, with no significant univariable association with any of the variables tested, and therefore no multivariable analyses were undertaken.

Discussion
This study found that identification of diabetes status from linked administrative health data using the index CHD admission underestimated the prevalence of diabetes in CHD patients to a greater degree in the ICD-10 period, with a correspondingly lower sensitivity and NPV. Sensitivity improved from 81.5% to 89.6% using a 10 year lookback in the ICD-10 period, with marginal improvement in any measures with a longer lookback period out to 15 years. Sensitivity was higher and specificity lower for Indigenous compared with non-Indigenous CHD patients at the index admission and with an increasing lookback period. In-hospital death, elective admission type, and a non-acute CHD principal diagnosis were significantly associated with false negatives on the index CHD admission in both periods. The level of false positives was low in both periods. Our results highlight the potential for changes in the accuracy of administrative data over time. Although specificity was high in both periods, sensitivity was significantly lower in the ICD-10 period. A study of myocardial infarction patients also found that following the change from ICD-9 to ICD-10, sensitivity reduced from 80% to 66% for diabetes with complications using hospitalization data [20]. In contrast, Chen et al. [21] found no impact of this change on the validity of diabetes and other comorbidities, possibly because of the use of multiple data sources (hospitalization and physician claims data). Our results suggest that although diabetes is reasonably accurately recorded in administrative data compared with other comorbidities [6,[22][23][24], use of data from the index admission only would attenuate likely upwards trends in diabetes prevalence in this population of CHD patients because of the lower sensitivity and NPV in the more recent ICD-10 period. Use of prior hospitalization history would reduce this difference. For example, a 10 year lookback period would increase sensitivity from 81.5% to 89.6% and NPV from 90.8% to 94.5%, with little loss of specificity and still maintaining a high PPV (91.9%) in the ICD-10 sample, and provide similar levels of concordance to that of the ICD-9 sample.  Lookback period ICD-9 ICD-10 ICD-9 ICD-10 ICD-9 ICD-10 ICD-9 ICD-10 ICD-9 ICD-10 ICD-9 ICD-10 ICD-9 ICD-10 Published comparisons for the accuracy of recording of diabetes in administrative data for Indigenous people are limited. A Canadian study found that sensitivity was higher (91.1% versus 86%) and specificity lower (92.8% versus 97%) for identifying prevalent diabetes cases in the Aboriginal compared with non-Aboriginal population [25]. Our results are consistent with this pattern. The likelihood of diabetes being recorded at index admission may be higher in Indigenous patients because diabetes is more actively diagnosed and treated during hospitalization due to the known high burden in this population. The significantly lower NPV in Indigenous people at index admission may result from the higher prevalence of diabetes in this group. However, despite an increased risk of CHD recurrence in Indigenous people [26], similar length of lookback periods (five to 10 years) optimized concordance measures between Indigenous and non-Indigenous people. Because the use of hospitalization history draws on all hospital admissions, not just those for CHD, this potentially reflects the higher hospitalization rates in all diabetics compared with the general population [27].
Whilst hospital morbidity data may underestimate population-level diabetes prevalence [28,29], our results demonstrate that use of a lookback period can provide an accurate measure of diabetes prevalence in a defined population such as hospitalized CHD patients. Comorbidities used in the Charlson Comorbidity Index identified from the index hospitalization are underestimated by 46% during the ICD-9 period [22], although there is some evidence that the use of the index admission only provides optimal model discrimination in mortality outcome studies [24,30]. Use of additional data sources such as claims data, where available, to identify diabetic status in this sample of patients may reduce the need for longer lookback periods. However, our results suggest that ICD-9 administrative data are reasonably accurate for identifying diabetic and non-diabetic cases but that the index admission alone may not correctly identify prevalent diabetic patients in the ICD-10 era, which is important information for jurisdictions where multiple data sources are not available.
The period differences shown in this study may relate to changes in coding practices. In many administrative health databases, conditions secondary to the principal reasons for admission to hospital are only coded if actively treated or investigated during the hospital stay [1]. However, during the 1990s in Australia, diabetes was required to be coded irrespective of documented intervention [1] which would contribute to the high levels of concordance between the two data sources in the ICD-9 period. Coding standards implemented with the introduction of ICD-10 reversed this requirement [7]. Further directives regarding coding of diabetic complications have apparently led to a marked increase in hospitalizations for complications of diabetes. This highlights the need to understand local coding directives and changes to standards which are relevant to the condition being investigated. Our results show that despite the incongruent effect of these coding changes, the use of specified lookback periods would allow for continuity in trends of diabetes prevalence in CHD patients.
In contrast to other studies, we found no significant association of increasing age or recent hospitalization with false negatives, and also no association of sex [31,32]. Differing and potentially changing impacts of age and sex mean that they are important variables for stratification in epidemiological studies of CHD trends [33] and our results show that such studies would not be biased across age and sex groupings. The only difference ascertained between time periods was an association of fewer comorbidities being recorded in the ICD-10 sample. This has important implications, as diabetes is more likely to be coded as a secondary than primary diagnosis [7]. It is unlikely that this finding is due to the number of available coding fields [21], as up to 21 diagnosis fields are available to researchers. Additional analysis of all CHD and CVD admissions in WA showed a small significant decrease in the number of comorbidities coded on admissions during the period of this study (data not shown), indicating a trend towards recording lower numbers of comorbidities during the more recent time period. A sensitivity analysis was undertaken where variables reaching significance at the p < 0.1 level in univariable analyses were included in the multivariable models. There were no differences in the significance levels of the existing variables in the models, which confirmed their significant association with false negatives as shown in the main analysis.

Limitations
The generalizability of our results to other hospitalized conditions, particularly non-cardiac conditions, is uncertain because concordance has been specifically measured in a sample of CHD patients. However, within a restricted population hospitalized with CHD, administrative data appear to reliably detect diabetes status. Although we have used recording of diabetes status in the medical records as a reference standard, there are potential limitations in this data source. Patients with less severe diabetes who are treated with diet alone may be less likely to be recorded in medical records as diabetic. There is also the possibility of inaccurate transfer of comorbid conditions to the discharge  summary, but review of the whole medical record including drug charts limits the impact of this on identifying diabetic patients. Our results may not be generalizable to patients aged over 80 years due to the age range selected in our sample. However, the lack of significant association between increasing age and under-recording of diabetes suggests that any difference in concordance measures in the very elderly may be small. The differences between the study samples could conceivably contribute to the differences in concordance measures demonstrated in our results, however, stratification by Indigenous status and by study source clearly demonstrate a similar pattern as seen in the main results. We were unable to demonstrate whether concordance measures have remained consistent since the latter period of our study.

Conclusion
This study has identified a temporal difference in concordance between medical records and administrative health data for the identification of diabetes in CHD patients. In linked administrative data, using up to ten years of hospitalization history to identify diabetes status reduces the temporal difference, improving concordance levels in the later ICD-10 period to those of the ICD-9 period. The use of unlinked administrative data to identify diabetes status would still provide reasonably high levels of accuracy however trends over time would be biased and prevalence of diabetes underestimated in the later period. Importantly, the level of concordance was as high in Indigenous as non-Indigenous patients in this setting, supporting the use of administrative data to identify diabetic status in this population group where diabetes and CHD impose a significant burden.