Research article | Open | Open Peer Review | Published:
The predictive value of ICD-10 diagnostic coding used to assess Charlson comorbidity index conditions in the population-based Danish National Registry of Patients
BMC Medical Research Methodologyvolume 11, Article number: 83 (2011)
The Charlson comorbidity index is often used to control for confounding in research based on medical databases. There are few studies of the accuracy of the codes obtained from these databases.
We examined the positive predictive value (PPV) of the ICD-10 diagnostic coding in the Danish National Registry of Patients (NRP) for the 19 Charlson conditions.
Among all hospitalizations in Northern Denmark between 1 January 1998 and 31 December 2007 with a first-listed diagnosis of a Charlson condition in the NRP, we selected 50 hospital contacts for each condition. We reviewed discharge summaries and medical records to verify the NRP diagnoses, and computed the PPV as the proportion of confirmed diagnoses.
A total of 950 records were reviewed. The overall PPV for the 19 Charlson conditions was 98.0% (95% CI; 96.9, 98.8). The PPVs ranged from 82.0% (95% CI; 68.6%, 91.4%) for diabetes with diabetic complications to 100% (one-sided 97.5% CI; 92.9%, 100%) for congestive heart failure, peripheral vascular disease, chronic pulmonary disease, mild and severe liver disease, hemiplegia, renal disease, leukaemia, lymphoma, metastatic tumour, and AIDS.
The PPV of NRP coding of the Charlson conditions was consistently high.
Comorbidities are coexistent diseases to a disease of interest  or index disease . Comorbidities may directly affect the prognosis of the disease of interest, or may indirectly affect the prognosis by affecting the choice of treatment [1–3].
An index of comorbidity level has the advantage of bringing several comorbidities into a single numeric score, thereby reducing the number of candidate variables into a manageable set of proxy variables, which is especially beneficial when using administrative databases or medical registries [4, 5]. The Charlson comorbidity index is the most widely used comorbidity index [3, 6]. It was developed to predict 1-year mortality among 604 medical patients based on comorbidity data obtained from hospital chart review in a single US hospital in 1984 . The 19 Charlson conditions were selected and weighted according to their potential influence on mortality and validated for predicting 1-year mortality in a cohort of 685 breast cancer patients . Since then, the Charlson comorbidity index has been adapted for use with data from administrative databases and medical registries that record medical conditions using the International Classification of Diseases, 9th revision (ICD-9), a Clinical Modification of ICD-9 (ICD-9-CM) [7, 8], and recently also the International Classification of Diseases, 10th revision (ICD-10) [9–11].
Only two studies have examined the accuracy of diagnosis coding of Charlson comorbidity conditions in administrative hospital registries compared with diagnoses obtained through medical records. A study by Quan et al  was carried out in 1996-1997 in Calgary, Canada using ICD-9 diagnosis codes, but the results may not necessarily hold today for European registries using ICD-10 codes. Henderson et al. assessed the quality of coding in Victoria, Australia soon after the implementation of ICD-10 in 1998-2001 compared with ICD-9 coding in the earlier years and found high coding quality in both time periods .
Medical registries and administrative databases offer an important resource for studies of public health issues . Scandinavian population-based medical registries can be linked using unique personal identifiers and are therefore used extensively for epidemiologic research [15, 16]. Data collection and coding procedures may vary across countries , and no Nordic study has validated the coding procedure on all conditions included in the Charlson comorbidity index. Measuring comorbidity accurately is important, since controlling for confounding by comorbidity affects the validity of such epidemiologic studies [3, 18]. Sufficient control for confounding requires high data quality .
We therefore conducted this study to assess the positive predictive value (PPV) of the coding of each of the 19 Charlson comorbidity conditions assessed by ICD-10 diagnoses from a population-based medical registry.
This study was conducted in the North Jutland Region, Denmark using patients with diagnoses registered in the Danish National Registry of Patients (NRP) between 1 January 1998 and 31 December 2007. The population of this region is approximately 500,000 people, corresponding to about 11% of the total Danish population. The Danish population receives tax-supported health care without additional charge.
The National Registry of Patients
The Danish NRP includes data on all non-psychiatric hospital admissions in Denmark since 1977 and outpatient clinic and emergency room visits since 1995. The NRP includes data on date of admissions and discharges, surgical procedures performed, and up to 20 diagnoses classified according to the International Classification of Diseases, 8th revision (ICD-8) until the end of 1993 and 10th revision (ICD-10) thereafter. The physician who discharges the patient reviews the medical record and makes a discharge summary including discharge diagnoses coded using ICD codes. ICD codes are then entered by the medical secretary into the hospital registry. From there, the data are electronically transmitted to the NRP at the National Board of Health (Figure 1) [16, 20].
We used the NRP to identify all hospital contacts (comprising both in- and outpatients) in the study population with one of the Charlson comorbidity conditions as a first-listed diagnosis . We did not make any age restriction. Outpatients included emergency room contacts and patients followed in outpatient clinics. According to Danish practice, the first-listed diagnosis in the discharge record is the main reason for the hospital contact. For each of the 19 Charlson comorbidity conditions, we selected randomly five hospital contacts per year in the ten year study period, yielding a total of 950 hospital contacts.
We used the patient's unique personal identification number and date of admission and discharge to identify the discharge summary for each of the 950 hospital contacts. The discharge summary describes the most important events during the hospital contact, including reason for admission, diagnostic work-up, treatment, prescribed medications, and plan after discharge . All discharge summaries were reviewed by the same physician (SKT).
The review of the discharge summary and medical record began with a confirmation of the personal identification number and the date of admission and discharge for the hospital contact retrieved from the NRP. We then proceeded with confirmation of the diagnosis. We considered a diagnosis to be confirmed if the discharge summary described the exact diagnosis or a diagnosis within the same Charlson comorbidity condition. For example, if a discharge summary or medical record indicated non-insulin dependent diabetes mellitus, and this disease was coded as insulin dependent diabetes mellitus in the NRP, then the diagnosis was considered to be confirmed because the Charlson comorbidity index does not distinguish between non-insulin dependent and insulin dependent diabetes mellitus. If the diagnosis was not described in the discharge summary or if the discharge summary was not available, then the medical record was reviewed to determine whether the diagnosis code could be confirmed. Discharge summaries from outpatient clinic contacts may only include a description of treatment, and in these cases the medical record was retrieved. When the reviewing physician had any doubt about whether the discharge summary or medical record agreed with the NRP ICD-10 code, the discharge summary or medical record was reviewed by a second physician (CFC), and the two physicians reached a consensus agreement. The review process was conducted twice for all patients by SKT (second review was done to check for typing errors).
We assessed the accuracy of the ICD-10 diagnostic codes in the NRP by comparison with the discharge summary or medical record, which were considered the reference standard. We quantified the accuracy by computing the positive predictive value and its corresponding 95% confidence intervals (CI), calculated with Clopper-Pearson binomial confidence intervals . The positive predictive value was the proportion of Charlson comorbidity conditions identified in the record contacts collected from the NRP that could be confirmed in the discharge summary or in the medical record.
We stratified the analyses by age, sex, and inpatients and outpatients both separately and together to elucidate any differences in the PPV. We also report the proportion of cases for whom the medical record was retrieved, for both inpatients and outpatients.
Data were entered in EpiData (EpiData Association, Odense, Denmark, http://www.epidata.dk) and analysed with STATA version 9.2 (StataCorp, College Station, Texas, USA). The study was approved by The Danish Data Protection Agency (record number: 2006-53-1396)
From the 950 diagnoses codes from the NRP we identified all 950 hospital contacts. Of these 588 (61.9%) were inpatients and 362 (38.1%) were outpatients. 65 (6.8%) patients were younger than 18 years of age. Of those most had leukemia (20 patients) or hemiplegia (16 patients) and the rest included diabetes mellitus, chronic pulmonary disease, connective tissue disease, moderate to severe renal disease, AIDS, lymphoma and any tumor. Discharge summaries needed more information to confirm the registry diagnosis in 238 (25%) of the contacts so the medical record was reviewed for these.
The overall positive predictive value for the first-listed diagnosis included in the 19 Charlson comorbidity conditions was 98.0% (95% CI; 96.9%, 98.8%). The PPV for the first-listed diagnoses in each of the Charlson comorbidity conditions ranged from 82.0% (95% CI; 68.6%, 91.4%) for diabetes mellitus with diabetic complications to 100% (one-sided 97.5% CI 92.2%, 100%) for congestive heart failure, peripheral vascular disease, chronic pulmonary disease, mild and severe liver disease, hemiplegia, renal disease, leukaemia, lymphoma, metastatic tumour and AIDS (Table 1). We found virtually no differences when stratifying in- and outpatients by each of the Charlson comorbidity condition.
When estimating the coding accuracy according to each stratum, we found a PPV of 98.6 (95% CI; 97.3%, 99.4%) for inpatients and 97.0 (95% CI; 94.6%, 98.5%) for outpatients. In females the PPV was 98.2% (95% CI; 96.5%, 99.2%) and in males 97.8% (95% CI; 96.1%, 98.9%). The PPVs were 100% (one-sided 97.5% CI; 94.5%, 100%) for patients aged below 18 years, 97.4% (95% CI; 94.0%, 99.1%) for patients aged 18 to 49 years, 97.3% (95% CI; 94.4%, 98.9%) for patients aged 50 to 64 years, 99.0% (95% CI; 96.5%, 99.9%) for patients aged 65 to 74 years, and 97.9% (95% CI; 95.1%, 99.3%) for patients 75 years or older.
The medical record review was required for confirmation of the diagnosis in 36.2% of outpatients, but only 18.2% of inpatients.
We found a positive predictive value greater than 90% for almost all ICD-10 diagnostic codes used to ascertain the Charlson comorbidity conditions in the NRP. This accuracy is better than the accuracy reported in earlier studies [12, 13].
Our study was conducted in an area with virtually complete data from the population-based administrative registries on all hospitalized patients during the study period. We examined patients admitted to hospitals in one region in Denmark, but do not expect that rates of coding errors would differ across the regions. We sampled the same number of discharge summaries and medical records from each of the ten years of the study period, and the NRP data were of excellent quality throughout the time period. We validated the 19 conditions included in the Charlson comorbidity index, which were selected by Charlson because they were important predictors of one-year mortality risk. This set of conditions, therefore, contains serious diseases that are readily diagnosed. The excellent PPV of these diagnostic codes in the NRP may not apply to less severe conditions recorded in the NRP. Furthermore, wrong coded diagnoses within the same Charlson category were considered confirmed, as errors of this type would not affect the Charlson comorbidity index score. Finally, we verified the discharge physicians' coding compared with the description in the discharge summary or medical record (Figure 1), but did not examine whether the diagnostic criteria were actually fulfilled.
We found almost twice as many outpatients as inpatients, who needed the entire medical journal to be retrieved for confirmation of the registry recorded diagnosis. The reason was that the physician often continued writing in the record from a previous visit. Therefore it was more often a short note, answers on blood tests etc. and often the diagnosis was not mentioned. Sometimes a note just showed that the patient missed his/her appointment.
The diagnosis code was known to the physician reviewer before reviewing the discharge summary. When in doubt, this may affect the judgement of the diagnostic coding in favour of a confirmation and we cannot rule this out as a possible partial explanation of the high PPVs. We have no information on patients with a Charlson comorbidity condition that was not diagnosed at a hospital; however, this concern is unlikely to influence our results because patients with these serious diseases are likely to have had previous hospital contact. Because we do not have un-coded patients, fx. a patient with diabetes hospitalized with a pneumonia may not always be coded with diabetes, we are unable to estimate the negative predictive value, sensitivity, or specificity, which are also important measures of validity of administrative hospital discharge databases .
Diabetes mellitus with diabetic complications had the lowest PPV, mainly because the diabetic complications could not be confirmed in the discharge summary or the medical record. For example, dysregulation of diabetes was typically interpreted as a complication without specification. A previous Danish study found a higher PPV of diabetes registration (insulin dependent diabetes mellitus of 96.3 (95% CI; 95.5-97.2) and noninsulin dependent diabetes mellitus of 97.9% (95% CI; 97.2; 98.5)) , however, they did not study the misclassification of the complications related to diabetes mellitus.
Two earlier studies have validated the quality of diagnostic coding in administrative databases used to ascertain Charlson comorbidities [12, 13] and one validated the coding process according to the ICD-10-AM. Henderson et al  assessed the quality of coding in routinely collected hospital discharge data in Australia based on ICD-10-AM for 1998-1999 (n = 7,004) and in 2000-2001 (n = 7,631). Their PPV (2000-2001) ranged from 62% (95% CI; 48, 76) for peripheral vascular disease to 94% (95% CI; 91, 97) for metastatic cancer. Their PPV for HIV was not included, as there were no prevalent cases. The validation study was completed shortly after the implementation of ICD-10 to examine whether the data quality was maintained or improved. Our study started 4 years after the implementation of ICD-10, which may explain our consistently higher PPVs.
The Canadian study  validated the quality of diagnostic coding on administrative data using the ICD-9-CM diagnoses included in the Charlson comorbidity index. The study was conducted in 1996-1997 including 1,200 inpatients in Calgary, Alberta. They found a PPV ranging from 44.0% to 96.3%. Some of the Charlson comorbidities were associated with considerable coding errors and were based on ICD-9-CM, which is not as widely used as the ICD-10. PPV values were found lower than 50% for liver disease (both mild and moderate to severe) and rheumatologic diseases.
Humphries et al  validated 7 of the Charlson comorbidities in 817 patients undergoing percutaneous coronary intervention at a single hospital from 1994-1995. The study was based on ICD-9-CM and their PPV ranged from 50.6% to 93.3% using the patient chart review data as the reference standard.
Several Danish studies have estimated the PPV of selected diagnoses included in the Charlson comorbidity index (e.g. acute myocardial infarction , cerebrovascular disease [28, 29], dementia , rheumatoid arthritis , liver cirrhosis , diabetes mellitus , cancer , haematological malignancies  and HIV ) in the NRP and generally report of lower PPV's than in our study. These studies validated diagnoses using strict diagnostic criteria, requiring for example specific clinical investigations. If these specific diagnostic criteria were not satisfied, then a patient was classified as not having the disease, even if the physician had diagnosed and treated the patient for that disease. These other studies were therefore meant to validate diagnoses, whereas we were validating ICD-10 codes against the diagnosis assigned by the treating physician. The difference in objective may explain the lower accuracies reported in these other Danish studies.
The PPV of diagnosis coding in the Danish NRP for conditions included in the Charlson comorbidity index is very high. The high positive predictive value in our study suggests that the NRP ICD-10 diagnostic codes are coded very accurately in comparison to the discharge physician's diagnosis. We could not, however, confirm whether the diagnosis was correct. The high accuracy supports the use of ICD-10 codes in future research to control for confounding by comorbidity as measured by the Charlson comorbidity index.
International Classification of Diseases, 8th revision
International Classification of Diseases, 9th revision
International Classification of Diseases, 10th revision
National Registry of Patients
Positive predictive value
Sandra Kruchov Thygesen
Christian Fynbo Christiansen
Hall SF: A user's guide to selecting a comorbidity index for clinical research. J Clin Epidemiol. 2006, 59: 849-855. 10.1016/j.jclinepi.2005.11.013.
Yancik R, Ershler W, Satariano W, Hazzard W, Cohen HJ, Ferrucci L: Report of the national institute on aging task force on comorbidity. J Gerontol A Biol Sci Med Sci. 2007, 62: 275-280.
de Groot V, Beckerman H, Lankhorst GJ, Bouter LM: How to measure comorbidity. a critical review of available methods. J Clin Epidemiol. 2003, 56: 221-229. 10.1016/S0895-4356(02)00585-1.
Lash TL, Mor V, Wieland D, Ferrucci L, Satariano W, Silliman RA: Methodology, design, and analytic techniques to address measurement of comorbid disease. J Gerontol A Biol Sci Med Sci. 2007, 62: 281-285.
Schneeweiss S, Maclure M: Use of comorbidity scores for control of confounding in studies using administrative databases. Int J Epidemiol. 2000, 29: 891-898. 10.1093/ije/29.5.891.
Charlson ME, Pompei P, Ales KL, MacKenzie CR: A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987, 40: 373-383. 10.1016/0021-9681(87)90171-8.
Deyo RA, Cherkin DC, Ciol MA: Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. J Clin Epidemiol. 1992, 45: 613-619. 10.1016/0895-4356(92)90133-8.
Needham DM, Scales DC, Laupacis A, Pronovost PJ: A systematic review of the Charlson comorbidity index using Canadian administrative databases: a perspective on risk adjustment in critical care research. J Crit Care. 2005, 20: 12-19. 10.1016/j.jcrc.2004.09.007.
Halfon P, Eggli Y, Van MG, Chevalier J, Wasserfallen JB, Burnand B: Measuring potentially avoidable hospital readmissions. J Clin Epidemiol. 2002, 55: 573-587. 10.1016/S0895-4356(01)00521-2.
Quan H, Sundararajan V, Halfon P, Fong A, Burnand B, Luthi JC, Saunders LD, Beck CA, Feasby TE, Ghali WA: Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care. 2005, 43: 1130-1139. 10.1097/01.mlr.0000182534.19832.83.
Sundararajan V, Henderson T, Perry C, Muggivan A, Quan H, Ghali WA: New ICD-10 version of the Charlson comorbidity index predicted in-hospital mortality. J Clin Epidemiol. 2004, 57: 1288-1294. 10.1016/j.jclinepi.2004.03.012.
Quan H, Parsons GA, Ghali WA: Validity of information on comorbidity derived rom ICD-9-CCM administrative data. Med Care. 2002, 40: 675-685. 10.1097/00005650-200208000-00007.
Henderson T, Shepheard J, Sundararajan V: Quality of diagnosis and procedure coding in ICD-10 administrative data. Med Care. 2006, 44: 1011-1019. 10.1097/01.mlr.0000228018.48783.34.
Jutte DP, Roos LL, Brownell MD: Administrative record linkage as a tool for public health research. Annu Rev Public Health. 2011, 32: 91-108. 10.1146/annurev-publhealth-031210-100700.
Frank L: Epidemiology. When an entire country is a cohort. Science. 2000, 287: 2398-2399. 10.1126/science.287.5462.2398.
Sørensen HT: Regional administrative health registries as a resource in clinical epidemiology. International Journal of Risk and Safety in Medicine. 1997, 10: 1-22.
Leal JR, Laupland KB: Validity of ascertainment of co-morbid illness using administrative databases: a systematic review. Clin Microbiol Infect. 2009, 16: 715-721.
Schneeweiss S, Seeger JD, Maclure M, Wang PS, Avorn J, Glynn RJ: Performance of comorbidity scores to control for confounding in epidemiologic studies using claims data. Am J Epidemiol. 2001, 154: 854-864. 10.1093/aje/154.9.854.
Iezzoni LI, Foley SM, Daley J, Hughes J, Fisher ES, Heeren T: Comorbidities, complications, and coding bias. Does the number of diagnosis codes matter in predicting in-hospital mortality?. JAMA. 1992, 267: 2197-2203. 10.1001/jama.267.16.2197.
Andersen TF, Madsen M, Jorgensen J, Mellemkjoer L, Olsen JH: The Danish National Hospital Register. A valuable source of data for modern health sciences. Dan Med Bull. 1999, 46: 263-268.
Thomsen RW: Diabetes mellitus and community-acquired bacteremia: Risk and prognosis. Thesis/Dissertation. 2004, Aarhus University Hospital, Department of Clinical Epidemiology
Rasmussen HH, Pedersen B, Sorensen HT, Freund KS: Epicrises from a department of medical gastroenterology. Ugeskr Laeger. 1991, 153: 1868-1870.
Clopper CJ, Pearson ES: The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika. 1934, 26: 404-413. 10.1093/biomet/26.4.404.
Sorensen HT, Sabroe S, Olsen J: A framework for evaluation of secondary data sources for epidemiological research. Int J Epidemiol. 1996, 25: 435-442. 10.1093/ije/25.2.435.
Nielsen GL, Sorensen HT, Pedersen AB, Sabroe S: Analyses of data quality in registries concerning diabetes mellitus--a comparison between a population based hospital discharge and an insulin prescription registry. J Med Syst. 1996, 20: 1-10. 10.1007/BF02260869.
Humphries KH, Rankin JM, Carere RG, Buller CE, Kiely FM, Spinelli JJ: Co-morbidity data in outcomes research: are clinical data derived from administrative databases a reliable alternative to chart review?. J Clin Epidemiol. 2000, 53: 343-349. 10.1016/S0895-4356(99)00188-2.
Madsen M, Davidsen M, Rasmussen S, Abildstrom SZ, Osler M: The validity of the diagnosis of acute myocardial infarction in routine statistics: a comparison of mortality and hospital discharge data with the Danish MONICA registry. J Clin Epidemiol. 2003, 56: 124-130. 10.1016/S0895-4356(02)00591-7.
Krarup LH, Boysen G, Janjua H, Prescott E, Truelsen T: Validity of stroke diagnoses in a National Register of Patients. Neuroepidemiology. 2007, 28: 150-154. 10.1159/000102143.
Johnsen SP, Overvad K, Sorensen HT, Tjonneland A, Husted SE: Predictive value of stroke and transient ischemic attack discharge diagnoses in The Danish National Registry of Patients. J Clin Epidemiol. 2002, 55: 602-607. 10.1016/S0895-4356(02)00391-8.
Phung TK, Andersen BB, Hogh P, Kessing LV, Mortensen PB, Waldemar G: Validity of dementia diagnoses in the Danish hospital registers. Dement Geriatr Cogn Disord. 2007, 24: 220-228. 10.1159/000107084.
Pedersen M, Klarlund M, Jacobsen S, Svendsen AJ, Frisch M: Validity of rheumatoid arthritis diagnoses in the Danish National Patient Registry. Eur J Epidemiol. 2004, 19: 1097-1103. 10.1007/s10654-004-1025-0.
Vestberg K, Thulstrup AM, Sorensen HT, Ottesen P, Sabroe S, Vilstrup H: Data quality of administratively collected hospital discharge data for liver cirrhosis epidemiology. J Med Syst. 1997, 21: 11-20. 10.1023/A:1022835207287.
Osterlind A, Jensen OM: Evaluation of cancer registration in Denmark in 1977. Preliminary evaluation of cancer registration by the Cancer Register and the National Patient Register. Ugeskr Laeger. 1985, 147: 2483-2488.
Norgaard M, Skriver MV, Gregersen H, Pedersen G, Schonheyder HC, Sorensen HT: The data quality of haematological malignancy ICD-10 diagnoses in a population-based hospital discharge registry. Eur J Cancer Prev. 2005, 14: 201-206. 10.1097/00008469-200506000-00002.
Obel N, Reinholdt H, Omland LH, Engsig F, Sorensen HT, Hansen AB: Retrivability in The Danish National Hospital Registry of HIV and hepatitis B and C coinfection diagnoses of patients managed in HIV centers 1995-2004. BMC Med Res Methodol. 2008, 8: 25-10.1186/1471-2288-8-25.
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/11/83/prepub
Declaration of funding interests: This study was funded by the Clinical Epidemiological Research Foundation ('Klinisk Epidemiologisk Forskningsfond'), Denmark. The Department of Clinical Epidemiology, Aarhus University Hospital, receives funding for other studies from companies in the form of research grants (and administered by) Aarhus University. None of these studies have any relation to the present study.
Declaration of personnel interests: Author and co-authors have no conflict of interest.
Study concept and design: SKT, CFC and HTS. Analysis and interpretation of data: SKT, CFC, SC, TLL and HTS. Drafting of the manuscript: SKT, CFC and SC. Statistical analysis: SKT. Critical revision of the manuscript for important intellectual content: CFC, SC, TLL and HTS. Study supervision: CFC, SC and HTS. All authors approved the final version.