The SEARCH for Diabetes in Youth study has conducted population-based incidence and prevalence ascertainment of non-gestational diabetes in youth since 2001 [1,2,3,4,5]. SEARCH identifies youth diagnosed under the age of 20 years at the following locations: health plan members in seven counties in Southern California, the state of Colorado, Native American reservations in Arizona and New Mexico, eight counties in Ohio, the state of South Carolina, and five counties in Washington. Cases are also identified by a variety of sources that include referrals from physicians and other health care providers, community health systems, and diabetes registries. In SEARCH, a diabetes case is determined by physician diagnosis. This determination can be made by provider report, medical record review, or self-report.
Three hospital systems that are part of the SEARCH case ascertainment network participated in this study: Cincinnati Children’s Hospital, Cincinnati, OH, Seattle Children’s Hospital, Seattle, WA, and Children’s Hospital Colorado, Denver, CO. The study was approved by the SEARCH coordinating center (Wake Forest University Health Sciences Institutional Review Board; IRB00015926) with waivers of informed consent and Health Insurance Portability and Accountability Act authorization. This study was also approved by the local Institutional Review Boards of the participating sites. Methods were carried out in accordance with the Declaration of Helsinki and all other relevant guidelines and regulations. Two of these study sites use EHRs developed by Epic (Verona, WI) while the other site employs an EHR developed by Cerner EHR (Kansas City, MO).
This work originates from a project designed to explore detection of diabetes status, diabetes type, and date of diagnosis within a cohort of possible 2017 prevalent cases. All potential cases of youth with diabetes aged less than 20 years in 2017 were extracted from the EHR at three hospital systems through the use of a highly sensitive algorithm. The sensitive algorithm included at least one inpatient or outpatient clinical encounter in 2017 and at least one of the following criteria: a diabetes-related International Classification of Disease, 10th Revision, (ICD-10) diagnosis code, a glycated hemoglobin A1c ≥ 6.5%, a fasting or random glucose value ≥126 mg/dl and 200 mg/dl respectively, or a diabetes-related medication .
Gold standard for date of diagnosis
Potential diabetes cases were matched to the SEARCH registry, which included date of diagnosis and diabetes type from previous medical record review. Diabetes cases identified by the sensitive algorithm that were not already in the registry underwent the same review process. Calendar month and year of date of diabetes diagnosis were recorded for each subject and used as the gold standard to which date of diagnosis algorithms were compared.
Electronic health record data
Structured outpatient and inpatient EHR data were extracted for patients within the following domains: demographics, laboratory measurements, diagnosis codes, medications, and vital signs. Dates were recorded as calendar month and year. Each site removed protected health information prior to transmitting the data to the coordinating center for harmonization and analysis.
Diabetes status and type
Previous research in SEARCH demonstrated that structured EHR data yields respectable metrics for determining diabetes status and type [6,7,8, 16]. The presence of at least two ICD-10 codes (E08-E13.x, P70.2, O24.0x, and O24.1x) determines status well, and a preponderance of type 1 diabetes, type 2 diabetes, and other diabetes (non-type 1 or type 2) codes can accurately determine type when paired with limited manual chart review of type 2 diabetes and other type cases . Given the excellent metrics in determining diabetes status and the similarity to other algorithms in the literature, the authors applied this rule-based ICD-10 approach to all accumulated diagnosis data from the point of EHR entry through 12/31/2017 to identify probable diabetes cases. We tested date of diagnosis algorithms within this population.
Probable cases according to the rule-based ICD-10 algorithm were included in the analysis. Eligibility criteria were intended to mimic a real-world application in which one would not know true diabetes status or date of diagnosis and would therefore be unable to subset the population according to either of these parameters. Patients were restricted to those first detected in the EHR from 1/1/2009 through 12/31/2017 as EHR systems were limited prior to 2009. This reduced the number of cases with incomplete data at the time of diagnosis. We considered limiting cases to those with a pre-defined period of time in the EHR without evidence of diabetes. We did not pursue this approach as this would have substantially reduced the size of the analytic cohort, and we found that performance metrics remained strong without the additional requirement.
Date of diagnosis algorithms
We considered two algorithms to determine date of diagnosis: an ICD code algorithm and a multiple-criteria algorithm. These algorithms were developed by a multidisciplinary team of clinicians, epidemiologists, and informaticians who participated in SEARCH and have extensive experience with childhood diabetes. The ICD code algorithm was defined as the time of occurrence of second diabetes diagnosis code (ICD-9: 249–250.x, 357.2, 362.0x, 366.41, 648.0x, 775.1 and/or ICD-10: E08-E13.x, P70.2, O24.0x, O24.1x) and was based upon previous success in the identification of prevalent diabetes cases . Both ICD-9 and ICD-10 codes were utilized as the span of potential diagnosis dates preceded the implementation of ICD-10 in October of 2015. The multiple-criteria algorithm was defined as the time of occurrence of the first diabetes-related diagnosis code, or elevated glycated hemoglobin ≥6.5%, or elevated glucose (≥ 126 mg/dl fasting, ≥ 200 mg/dl random), or diabetes-related medication (Alpha Glucosidase Inhibitors, Dipeptidyl Peptidase-4 (DPP4) Inhibitors, Glucagon-like Protein-1 (GLP-1) Receptor Agonists, Insulin, Meglitinides, Sodium-glucose co-transporter-2 (SGLT2) inhibitors, Sulfonylureas, Thiazolidinediones, and other medications identified by clinicians). This combination of variables was based upon strong association with diabetes status, presence in the literature [6, 8, 11, 14, 15], and adequate data availability in the EHR.
All analyses were conducted using R version 3.6.2 (R foundation for Statistical Computing). We assessed the performance of the rule-based ICD-10 status algorithm for diabetes status with accuracy, sensitivity, and specificity. Performance between each date of diagnosis algorithm compared to the gold standard calendar year of diagnosis was quantified by percent agreement (number of observations where predicted calendar year matched the gold standard year divided by the total number of probable diabetes cases identified by the rule-based ICD-10 status algorithm) and Cohen’s Kappa for interrater reliability . McNemar’s test identified if the marginal proportions between algorithms differed overall and within each diabetes type. A two proportion z-test identified differences in proportions correctly classified between type 1 and type 2 diabetes cases within each algorithm. We deemed results of all tests statistically significant at P < 0.05. We examined concordance between predicted and gold standard calendar month and year for the ICD code algorithm by scatterplot with underlying distribution by gold standard diagnosis year. We inspected performance over time visually by line graph with 95% confidence intervals for each year of diagnosis. Visualizations were limited to 2009–2017 due to lack of EHR data prior to 2009; type 2 cases were limited to 2012–2017 due to the small number of cases (n = 12) with gold standard date from 2009 to 2011. While year alone is most relevant for surveillance purposes, month and year of diagnosis is important for a variety of other reasons beyond the scope of this paper. Therefore, we also report overall percent agreement, Kappa, and McNemar’s test for the algorithms compared to the gold standard calendar month and year (plus or minus 1 month) and visually examined performance over time.