Development and validation of algorithms to classify type 1 and 2 diabetes according to age at diagnosis using electronic health records

Background Validated algorithms to classify type 1 and 2 diabetes (T1D, T2D) are mostly limited to white pediatric populations. We conducted a large study in Hong Kong among children and adults with diabetes to develop and validate algorithms using electronic health records (EHRs) to classify diabetes type against clinical assessment as the reference standard, and to evaluate performance by age at diagnosis. Methods We included all people with diabetes (age at diagnosis 1.5–100 years during 2002–15) in the Hong Kong Diabetes Register and randomized them to derivation and validation cohorts. We developed candidate algorithms to identify diabetes types using encounter codes, prescriptions, and combinations of these criteria (“combination algorithms”). We identified 3 algorithms with the highest sensitivity, positive predictive value (PPV), and kappa coefficient, and evaluated performance by age at diagnosis in the validation cohort. Results There were 10,196 (T1D n = 60, T2D n = 10,136) and 5101 (T1D n = 43, T2D n = 5058) people in the derivation and validation cohorts (mean age at diagnosis 22.7, 55.9 years; 53.3, 43.9% female; for T1D and T2D respectively). Algorithms using codes or prescriptions classified T1D well for age at diagnosis < 20 years, but sensitivity and PPV dropped for older ages at diagnosis. Combination algorithms maximized sensitivity or PPV, but not both. The “high sensitivity for type 1” algorithm (ratio of type 1 to type 2 codes ≥ 4, or at least 1 insulin prescription within 90 days) had a sensitivity of 95.3% (95% confidence interval 84.2–99.4%; PPV 12.8%, 9.3–16.9%), while the “high PPV for type 1” algorithm (ratio of type 1 to type 2 codes ≥ 4, and multiple daily injections with no other glucose-lowering medication prescription) had a PPV of 100.0% (79.4–100.0%; sensitivity 37.2%, 23.0–53.3%), and the “optimized” algorithm (ratio of type 1 to type 2 codes ≥ 4, and at least 1 insulin prescription within 90 days) had a sensitivity of 65.1% (49.1–79.0%) and PPV of 75.7% (58.8–88.2%) across all ages. Accuracy of T2D classification was high for all algorithms. Conclusions Our validated set of algorithms accurately classifies T1D and T2D using EHRs for Hong Kong residents enrolled in a diabetes register. The choice of algorithm should be tailored to the unique requirements of each study question.


Background
Administrative health databases are an important resource for population-based diabetes research [1]. Using routinelycollected data such as billing codes and hospitalization records, various algorithms have been developed to identify diabetes [2,3]. While these algorithms capture diabetes diagnoses, they cannot accurately identify diabetes type [2][3][4][5]. Type 1 diabetes (T1D) is an autoimmune disease that classically occurs in children, but may rarely occur in older adults [6]. In T1D, autoantibodies destroy the insulin-producing pancreatic beta cells, causing insulin deficiency and hyperglycemia. Type 2 diabetes (T2D), which typically occurs in adulthood, is caused by genetic and other risk factors such as obesity that lead to insulin resistance and hyperglycemia, although lean individuals may also develop T2D due to insulin deficiency [6]. While T1D must be treated with insulin, T2D may be treated with lifestyle modification, insulin, or other glucose-lowering medications [6].
Many epidemiological studies apply the untested assumption that findings in adults with diabetes are representative of T2D [7,8]. However, the prognoses of T1D and T2D are markedly different [9]-especially among adults aged < 40 years, where both types commonly occur and may be difficult to distinguish clinically [1,9]. In this age group, it has been shown that T2D is associated with a 15-fold elevation in the risk of cardiovascular complications versus T1D [9]. Yet, diabetes types are poorly documented in administrative databases, which were not originally designed for research purposes. Specific diagnostic codes for T1D and T2D may be erroneously entered [10] or unavailable in some billing systems [2]. Furthermore, classification of diabetes type is particularly important in Asia because disaggregated population-level T1D and T2D incidence and prevalence have never been measured [11].
Considering the lifelong and immediate need for insulin treatment in T1D, novel algorithms have been developed to identify T1D using prescriptions and laboratory data from electronic health records (EHRs) [12]. However, previous validation studies had small sample sizes and were mostly limited to children in white populations [13][14][15][16]. One study developed and validated a complex algorithm to detect T1D in a US population with 65% (36-100%) sensitivity and 88% (78-98%) positive predictive value (PPV) using EHRs [12]. However, algorithms developed for white populations may have a poorer PPV when applied to Asian populations, as the prevalence of T1D in Asians appears to be much lower than white people [17]. The proportion of diabetes cases classified as T1D and T2D also varies enormously by age at diagnosis; yet, the effect of age at diagnosis on the performance of classification algorithms has never been specifically studied. To address these gaps, we conducted a large study among Hong Kong residents with diabetes to develop and validate algorithms using EHRs to classify T1D and T2D against clinical assessment as the reference standard, and to evaluate performance by age at diagnosis.

Setting and data sources
Hong Kong is a special administrative region of China with a population of 7.3 million and an estimated diabetes prevalence of 10.3% (2014) [18]. All residents are entitled to universal inpatient and outpatient health services operated by the governmental Hong Kong Hospital Authority (HA), which is modeled after the National Health Service of Britain. Given the wide public-private healthcare cost differential, HA hospitals account for about 95% of all bed-days [19].
The Hong Kong Diabetes Surveillance Database (HKDSD) includes all Hong Kong residents with diabetes as identified using the HA's territory-wide EHR, which includes routinely-collected data on laboratory tests, prescriptions, and hospital visits for the entire population. We defined diabetes onset as the first occurrence of glycated haemoglobin A 1c ≥ 6.5% [20], fasting plasma glucose ≥ 7 mmol/L [21], glucose-lowering medication prescription [3,4] excluding insulin, or long-term insulin prescription (≥ 28 days). To avoid detecting gestational diabetes [22], we excluded events occurring within 9 months prior to or 6 months after delivery (International Statistical Classification of Diseases and Related Health Problems version 9 (ICD-9) codes 72-75), or within 9 months of any pregnancy-related encounter (ICD-9 codes 630-676) outside these periods (in case of aborted pregnancies or delivery in a non-HA hospital). We also excluded in-patient glucose measurements to avoid misidentifying acute stress hyperglycemia as diabetes.
A subset of those in the HKDSD is additionally enrolled in the multicentre Hong Kong Diabetes Register (HKDR, Supplementary Table 1, Additional File). This register was established in 1995 at the Diabetes and Endocrine Centre at the Prince of Wales Hospital, a tertiary care public hospital in the New Territories East region with a catchment of 1.3 million residents, and was later expanded to 2 additional hospitals [23,24]. Anyone with diabetes is eligible for enrolment in the HKDR. Referrals are selfinitiated or from physicians located typically in community-or hospital-based clinics. All enrolled individuals undergo a comprehensive assessment including a detailed clinical history, fundoscopy and foot exams, and serum and urinary laboratory testing. This assessment yields detailed data including diabetes type, which is otherwise unavailable in the HKDSD. The research was approved by the Chinese University of Hong Kong-New Territories East Cluster Clinical Research Ethics Committee.

Study population
Because the reference standard (clinical assessment) was only established for the subset of those enrolled in the HKDR, we restricted the study to this subpopulation. To ensure at least 1 year of follow-up data, we included all people with diabetes diagnosed at ages 1.5 (to exclude neonatal diabetes) to 100 years from 1 January 2002 through 31 December 2015, defined using the HKDSD criteria. The maximum follow-up date was 31 December 2016. We excluded individuals with monogenic or secondary diabetes and those with missing diabetes type in the HKDR (Fig. 1). We randomized the remaining individuals into the derivation (two thirds) and validation (one third) cohorts.

Reference standard
C-peptide and autoantibody testing are not routinely available to confirm T1D diagnosis in the public setting, and self-funded tests are rarely performed. Therefore, we applied the standard clinical definition of T1D adopted by the HKDR [25], which strictly defines T1D as diabetic ketoacidosis, unprovoked heavy ketones in urine or requirement of insulin within the first year of diagnosis. An Fig. 1 Flow diagram depicting creation of the study cohorts using the sub-population of people in the Hong Kong Diabetes Surveillance Database who were also enrolled in the Hong Kong Diabetes Register (HKDR). Diabetes type classification consisted of 2 steps: (1) comprehensive assessment, and (2) chart review of cases initially flagged as type 1 diabetes endocrinologist reviewed all charts initially marked as T1D in the HKDR to ensure accuracy.

Algorithm development and validation
We applied clinical knowledge (based on the experience of endocrinologists with expertise in diabetes management: CK, BRS, AL, JCNC) and reviewed previous validation studies [12-16, 26, 27] to develop candidate algorithms to identify T1D using either ICD-9 encounter codes ("code algorithms"; type 1 codes: 250.x1, 250.x3; type 2 codes: 250.x0, 250.x2) or prescriptions ("prescription algorithms"; Supplementary Tables 2-3, Additional File). We varied the number, ratio, and types of codes required, as well as the duration of time allowed between the diagnosis date and the initial insulin prescription. Positive cases were automatically classified as T1D and negative as T2D. Using the derivation cohort, we selected algorithms based on the sensitivity and PPV of identifying T1D, as these are the most important characteristics for public health [28]. Since the most sensitive algorithms had poor PPV and vice versa, we chose the best algorithms with the highest sensitivity and PPV separately, among both code and prescription algorithms (total: 4 algorithms, labelled A-D). We resolved ties by selecting the algorithm with the greatest sum of sensitivity and PPV. Then, we paired the 2 best code algorithms with the 2 best prescription algorithms using 2 methods in an effort to further improve accuracy [29,30]. These methods were: combining using "or" (for example, "A or B") to improve sensitivity, and combining using "and" (for example, "A and B") to improve PPV. We then tested all 8 "combination algorithms" in the derivation cohort. Of the 12 code, prescription, and combination algorithms, we identified the 3 algorithms with the highest sensitivity, highest PPV, and highest kappa coefficient ("optimized" algorithm) across all ages. Using the validation cohort, we evaluated the performance of these 3 algorithms in classifying T1D and T2D by age at diagnosis.
We repeated the entire procedure using additional laboratory data (estimated glomerular filtration rate) to determine whether requiring normal renal function with insulin prescriptions would improve the performance of prescription algorithms.

Statistical analysis
We calculated the sensitivity, specificity, PPV, and negative predictive value (NPV) with 95% exact confidence intervals of each selected algorithm for classifying T1D and T2D in the derivation and validation cohorts. We also calculated Cohen's kappa coefficient, which represents agreement after agreement due to chance is removed [31]. A perfect algorithm would have sensitivity, specificity, PPV, and NPV values of 100%, and a kappa value of 1.0. Missing data were minimal (missing diabetes type: n = 357, 2.3%) and handled by complete case analysis. All analyses were performed using the "FREQ" procedure in SAS version 9.4 (Cary, NC).

Results
There were 15,300 individuals with complete data and diabetes diagnosed during 2002-15 ( Fig. 1). Of these cases, 121 were initially classified as T1D. After chart review, 3 were excluded as monogenic or secondary diabetes and 15 were re-classified as T2D, leaving 103 T1D cases remaining. The final cohorts consisted of 10,196 (derivation) and 5101 (validation) individuals. Tables 1  and 2 show the baseline demographic characteristics of the study cohorts. The distribution of baseline characteristics was highly similar across the derivation and validation cohorts and across the HKDR and HKDSD, although the HKDR population had more prescriptions for insulin and other glucose-lowering medications. The average age at diagnosis was 22.7 years for T1D and 55.9 years for T2D ( Table 2; see Supplementary Figure 1, Additional File). More men (56.1%) had T2D, but for T1D the sex ratio was more balanced. People with T1D had a median of 3.0 type 1 codes, including 2.0 from the primary diagnosis on the hospital discharge abstract. People with T2D had a median of 1 type 2 code. Although most people with T1D had at least 1 type 1 code (83.3% sensitivity), the PPV for this algorithm was only 26.0%. Most people with T1D also had at least 1 type 2 code (70.0%). Code algorithms using a ratio of type 1 to type 2 codes had a higher PPV and similar sensitivity compared to those using the number of type 1 or type 2 codes. Two algorithms had the highest sensitivity (83.3%), but "ratio of type 1 to type 2 codes ≥ 0.5" (algorithm A) was chosen because it had a higher PPV (34.0%) than "at least 1 type 1 code." "Ratio of type 1 to type 2 codes ≥ 4" (algorithm B) was chosen for having the highest PPV (57.3%, sensitivity 71.7%).
Among the prescription algorithms, those specifying "at least 1 insulin prescription" were the most sensitive but lacked PPV for classifying T1D. Nearly everyone with T1D received an insulin prescription at any time (59 of 60 people, 98.3% sensitivity), and almost all received it within 90 days of diabetes diagnosis (58 of 59 people, 96.7% sensitivity). As these 2 prescription algorithms had the highest sensitivity values and classified everyone identically except for 1 case, we applied the tiebreaker criteria to choose "insulin prescription within 90 days" (algorithm C) based on its greater PPV (8.6%, versus 1.7% for "insulin prescription at any time"). Adding criteria for other types of medications improved the PPV of insulin-based prescription algorithms at the expense of sensitivity. In the T1D cohort, 36.7% received at least 1 metformin prescription (versus 88.6% in the T2D cohort), and 16.7% received a glucose-lowering medication 38,691 (6.9) any combination of long-acting and short-acting insulin Abbreviations: A1C glycated haemoglobin A 1c , LDL-C low-density lipoprotein cholesterol, HDL-C high-density lipoprotein cholesterol, IQR interquartile range, eGFR estimated glomerular filtration rate, DPP-4 dipeptidyl peptidase-4, GLP-1 glucagon-like peptide-1, SGLT2 sodium-glucose transport protein 2, RAS renin-angiotensin system Table 2 Baseline characteristics and performance of candidate algorithms among people in the derivation cohort, stratified by diabetes type. Candidate algorithms developed using encounter codes ("code algorithms") or prescriptions ("prescription algorithms") are also shown. For each algorithm, values in the Type 1 and 2 columns indicate the number and percentage of individuals satisfying the algorithm (sensitivity). Positive predictive values for classifying type 1 diabetes are shown in the right column. The best 4 algorithms are indicated by the letters in parentheses (A-D; see text for selection criteria) Type    prescription other than insulin and metformin (versus 75.7% in the T2D cohort). Of the algorithms that added a condition for no other glucose-lowering medication prescriptions in addition to an insulin prescription, the algorithm "at least 1 insulin prescription with no other glucose-lowering medication prescriptions except for metformin" had the highest PPV (31.0%; sensitivity 60.0%).
Specifying the type of insulin as multiple daily injections further improved the PPV. "Multiple daily injections with no other glucose-lowering medication prescription" (algorithm D) had a 78.0% PPV (sensitivity 53.3%), which was the highest of the prescription algorithms. Algorithms A-D classified T1D well for age at diagnosis < 20 years in the derivation cohort, but as the proportion of diabetes cases classified as T1D dropped with age, the precision and estimates of sensitivity and PPV also dropped (Fig. 2). For age at diagnosis < 20 years, algorithm B had the highest kappa coefficient (sensitivity: 91.3, 95% confidence interval 72.0-98.9%; PPV: 80.8%, 60.6-93.4%; Table 3). For age at diagnosis ≥ 20 years, algorithm C was the most sensitive but lacked PPV, while algorithm D had the highest PPV and kappa coefficient, despite a low sensitivity (age at diagnosis 20-39 years: sensitivity 50.0%, 29.9-70.1%, PPV 81.3, 54.4-96.0%; ≥ 40 years: sensitivity 27.3%, 6.0-61.0%, PPV 50.0%, 11.8-88.2%).
As with algorithms A-D, performance of the combination algorithms also generally dropped at older ages at diagnosis (Fig. 3). For ages at diagnosis < 20 years, 4 combinations had 100.0% (85.2-100.0%; Table 3) sensitivity; among these algorithms, combination "A and C" had the highest PPV (74.2%, 55.4-88.1%). Among adults aged ≥ 20 years, sensitivity and PPV differed depending on the type of combination. "And" combinations had the highest PPV. "A and D" had the highest PPV among adults (age at diagnosis 20-39 years: 90.9%, 58.7-99.8%; ≥ 40 years: 50.0%, 11.8-88.2%), but the sensitivity was low (age at diagnosis 20-39 years: 38.5%, 20.2-59.4%, ≥40 years: 27.3%, 6.0-61.0%). Combinations "A or C" and "B or C" had the highest sensitivity (100.0%, 86.8-100.0%), while "B or C" had a relatively higher PPV (age at diagnosis 20-39 years: 38.5, 22.8%, 15.5-31.6%, ≥ 40 years: 1.9%, 0.9-3.4%). Among the "or" combinations, "A or C" and "B or C" had the identically highest sensitivity for classifying T1D (age at diagnosis 20-39 years: 100.0%, 86.8-100.0%, ≥ 40 years: 90.9%, 58.7-99.8%). However, these algorithms had low PPV (age at diagnosis 20-39 years: 19.1-22.8%, ≥ 40 years: 1.8-1.9%). Table 2 Baseline characteristics and performance of candidate algorithms among people in the derivation cohort, stratified by diabetes type. Candidate algorithms developed using encounter codes ("code algorithms") or prescriptions ("prescription algorithms") are also shown. For each algorithm, values in the Type 1 and 2 columns indicate the number and percentage of individuals satisfying the algorithm (sensitivity). Positive predictive values for classifying type 1 diabetes are shown in the right column. The best 4 algorithms are indicated by the letters in parentheses (A-D; see text for selection criteria) (Continued) Among the 12 algorithms we tested, "B or C," "B and D," and "B and C" had the best sensitivity ("high sensitivity for type 1" algorithm), PPV ("high PPV for type 1" algorithm), and kappa coefficient ("optimized" algorithm) respectively across all ages in the derivation cohort. Table 4 displays the performance characteristics of these algorithms in the validation cohort. The "high sensitivity for type 1" algorithm had a sensitivity of 95.3% (84.2-99.4%; PPV 12.8%, 9.3-16.9%), while the "high PPV for type 1" algorithm had a PPV of 100.0% (79.4-100.0%; sensitivity 37.2%, 23.0-53.3%) across all ages. The optimized algorithm had a sensitivity of 65.1% (49.1-79.0%) and PPV of 75.7% (58.8-88.2%) across all ages. These algorithms produced distinctive estimates of the proportion of cases classified as T1D among all diabetes cases according to age at diagnosis (Fig. 4). The high "PPV for type 1" algorithm yielded conservative estimates, while the "high sensitivity for type 1" algorithm inflated estimates. Estimates from "optimized" algorithm closely matched the reference standard across age at diagnosis.
Modifying algorithms with renal function criteria resulted in similar PPV with the same or lower sensitivity, and ultimately did not improve performance (Supplementary Tables 4-6, Additional File). All selected algorithms had high sensitivity and PPV in classifying T2D across all ages at diagnosis (sensitivity range 93.5-100.0%, PPV range 99.7-100.0%, Supplementary Table 7, Additional File). As all cases were classified as T1D or T2D in a binary fashion, the "high sensitivity for type 1" algorithm was equivalent to a "high PPV for type 2" algorithm, while the "high PPV for type 1" algorithm was equivalent to a "high sensitivity for type 2" algorithm (Supplementary Table 8, Additional File).

Discussion
This is one of the largest validation studies of algorithms using EHRs to classify T1D and T2D among children and adults, and the only validation study in an Asian population. Using a systematic approach to generate a set of algorithms maximizing sensitivity and PPV, we revealed that classification performance is best at lower ages at  Table 3 Test characteristics of single (A-D) and combination algorithms for classifying type 1 diabetes compared to the reference standard in the derivation cohort, stratified by age at diagnosis. Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) are percentages with 95% confidence intervals. Cohen's kappa coefficient represents agreement after agreement due to chance is removed (1.0 indicates perfect agreement) [31]. The "Type 1 Proportion" columns refer to the percentage of people in the cohort with diabetes classified as having type 1 using each algorithm ("Calculated") and the reference standard ("True"). The best overall algorithms are marked (* = highest sensitivity, † = highest PPV, ‡ = highest kappa coefficient) diagnosis and drops as age at diagnosis increases-a finding that has not previously been demonstrated. We developed a "high sensitivity for type 1" algorithm (ratio of type 1 to type 2 codes ≥ 4, or at least 1 insulin prescription within 90 days) with > 90% sensitivity across age at diagnosis at the expense of lower PPV, and a "high PPV for type 1" algorithm (ratio of type 1 to type 2 codes ≥ 4, and multiple daily injections with no other glucose-lowering medication prescription) with perfect PPV across age at diagnosis at the expense of lower sensitivity. Our optimized algorithm (ratio of type 1 to type 2 codes ≥ 4, and at least 1 insulin prescription within 90 days) produced the most accurate estimates of the proportion of T1D cases across all ages at diagnosis. The complementary Table 3 Test characteristics of single (A-D) and combination algorithms for classifying type 1 diabetes compared to the reference standard in the derivation cohort, stratified by age at diagnosis. Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) are percentages with 95% confidence intervals. Cohen's kappa coefficient represents agreement after agreement due to chance is removed (1.0 indicates perfect agreement) [31]. The "Type 1 Proportion" columns refer to the percentage of people in the cohort with diabetes classified as having type 1 using each algorithm ("Calculated") and the reference standard ("True"). The best overall algorithms are marked (* = highest sensitivity, † = highest PPV, ‡ = highest kappa coefficient)    Fig. 3 Sensitivity and positive predictive value of the 8 combination algorithms for classifying type 1 diabetes in the derivation cohort by age at diagnosis.* We paired single algorithms using "and" to maximize positive predictive value (panels a-d) and "or" to maximize sensitivity (panels e-h). See Fig. 2 for algorithm descriptions *smoothed using 15-year moving averages performance characteristics of these algorithms can inform their application to future studies, and the choice of algorithm should be tailored to the unique requirements of each study question. Among children and adolescents, our diabetes classification algorithms performed similarly to others developed in white populations. Using Canadian administrative and prescription data, Vanderloo et al. [14] validated 4 algorithms using a combination of "Status Indian" registration, age < 10 years, and prescriptions to classify diabetes types. Although the sensitivity and PPV for classifying T1D were high (range: 96.9-99.2%), performance for identifying Table 4 Test characteristics of the high sensitivity, high positive predictive value (PPV), and balanced algorithms for classifying type 1 diabetes compared to the reference standard in the validation cohort, stratified by age at diagnosis. Sensitivity, specificity, PPV and negative predictive value (NPV) are percentages with 95% confidence intervals. Cohen's kappa coefficient represents agreement after agreement due to chance is removed (1.0 indicates perfect agreement) [31]. The "Type 1 Proportion" columns refer to the percentage of people in the cohort with diabetes classified as type 1 using each algorithm ("Calculated") and the reference standard ("True")  Table 6 for algorithms using renal function criteria Fig. 4 Proportion of all diabetes cases classified as type 1 by age at diagnosis in the validation cohort.* This proportion is calculated as the percentage of people in the cohort with diabetes classified as type 1 using the reference standard (dashed line), as well as high sensitivity for type 1, optimized, and high positive predictive value for type 1 algorithms (see Table 4 for descriptions) *smoothed using 15-year moving averages T2D was worse (sensitivity range: 55.4-84.2%; PPV range: 54.7-73.7%) and relied on ethnicity criteria that are not applicable in other populations. In a post-hoc analysis, we modified these algorithms by excluding inapplicable criteria and applied them to our data (Supplementary Tables 9-10, Additional File). These modified algorithms performed identically to our "high sensitivity for type 1" algorithm in classifying T1D (sensitivity 100.0%, 76.8-100.0%; PPV 70.0%, 45.7-88.1%) and T2D (sensitivity 77.8%, 57.7-91.4%; PPV 100.0%, 83.9-100.0%). In the large United States SEARCH for Diabetes in Youth Study (SEARCH), several algorithms were developed to identify diabetes type [13,15,16]. The "at least 1 outpatient T1D code" (sensitivity 94.8%, PPV 98.0% in SEARCH) [13] had 100.0% sensitivity (76.8-100.0%) and a better PPV (87.5%, 61.7-98.4%) than our "high sensitivity for type 1" algorithm. Other published SEARCH algorithms requiring the ratio of type 1 to total codes > 0.5 [15] and 0.6 [16] performed identically to our optimized algorithm (sensitivity 85.7-100.0%, PPV 87.5-100.0% for identifying T1D), although the latter algorithm required manual review to assess diabetes type for over a third of cases. The reasonable performance of these other algorithms confirms that T1D can be identified among children and adolescents using administrative and EHR data across different settings. Our results extend the literature with an expanded set of algorithms with optimal, maximally sensitive, or maximally predictive characteristics without the use of manual review, which would be unfeasible for large population-based studies. By contrast, classification accuracy of the algorithms was lower among adults versus children. Previous validation studies including adults are limited. Klompas et al. [12] used a large EHR including primary and specialty care providers to develop and validate a complex algorithm (type 1 to type 2 codes > 0.5 and prescription for glucagon, type 1 to type 2 codes > 0.5 with no oral hypoglycemic other than metformin, C-peptide negative, autoantibodies positive, or prescription for urine acetone test strips) that reported a 65% (36-100%) sensitivity and 88% (78-98%) PPV for T1D and 100% (99-100%) sensitivity and 95% (88-100%) PPV for T2D. A modified version of this algorithm excluding urine acetone test strips was later tested separately [27]. However, these studies are limited by the lack of "and" combinations, and the use of a weighted sampling strategy that could have inflated estimates of PPV [12,27]. Although algorithm performance in adults was not specifically reported, our post-hoc analysis showed that the algorithm proposed by Klompas et al. [12] (adapted to fit our data; see Supplementary Tables 9-10, Additional File) had decreased sensitivity (62.5%, 24.5-91.5%) and PPV (26.3%, 9.1-51.2%) among adults aged ≥ 40 years at diagnosis versus people aged < 20 years at diagnosis (sensitivity 100.0%, 76.8-100.0%, PPV 93.3%, 68.1-99.8%). The performance of another algorithm developed within a general practice EHR in the UK [26] showed a similar pattern using our data, although the overall performance was worse than our algorithms (sensitivity 39.5%, 25.0-55.6%; PPV 40.5%, 25.6-56.7% at all ages). While these results may be expected based on the rarity of T1D in adulthood, our large study adds a new approach to maximize sensitivity, PPV, or overall accuracy across all ages using different types of combinations. Moreover, we confirmed that renal function does not improve algorithm performance in adults, and this may reflect the growing variety of non-insulin agents available for people with diabetes and impaired renal function.
Our study yielded 3 complementary algorithms, the choice of which can be tailored to different study contexts depending on diabetes type, sensitivity, and PPV requirements. The optimized algorithm (ratio of type 1 to type 2 codes ≥ 4, and at least 1 insulin prescription within 90 days) performed highly accurately at ages at diagnosis < 20 years, but it also generated close estimates of the proportion of T1D among adults, as misclassified T1D and T2D cases were approximately balanced. Thus, the optimized algorithm could be applied to diabetes incidence and prevalence studies. Other algorithms may be better suited for cohort studies or other designs. For example, an adult-onset T1D cohort study could use the "high PPV for type 1" algorithm (ratio of type 1 to type 2 codes ≥ 4, and multiple daily injections with no other glucose-lowering medication prescription) to maximize PPV. Alternatively, a case-finding study designed to identify as many people with T1D as possible might apply the "high sensitivity for type 1" algorithm (ratio of type 1 to type 2 codes ≥ 4, or at least 1 insulin prescription within 90 days). A cohort study of T2D among adults could apply the "high PPV for type 2" (equivalent to "high sensitivity for type 1") algorithm, although all 3 algorithms performed well considering the relatively high T2D prevalence in adults.
Our large register-based validation study is the first to specifically distinguish T1D and T2D in Asians, using routinely available encounter codes and prescriptions in a population-wide EHR within a public universal healthcare context. Unlike previous studies, we demonstrated the critical importance of age at diagnosis, defining separate derivation and validation cohorts to avoid overfitting. However, there are some limitations to note. As in other public healthcare settings, we did not have access to routine autoantibody or C-peptide testing to verify diagnoses of T1D. We could not include the entire HKDSD or externally validate because full chart access was only authorized for the HKDR. However, the HKDR represents a large geographic region of Hong Kong, which has a single publicly administered healthcare system serving its entire population. Although socioeconomic status variables were not captured in our databases, other baseline characteristics were highly similar between the HKDR and HKDSD, supporting the generalizability of our algorithms. Research platforms such as the HA's Data Collaboration Lab should allow more comprehensive use of EHR data to improve diabetes classification using more complex methodologies and to enhance population research [32][33][34].

Conclusions
In summary, we developed and validated a set of algorithms to accurately classify diabetes type for different ages at diagnosis using population-level health data. As EHRs become increasingly available, our approach may be applied to generate similar algorithms in other settings. These algorithms can be applied to future studies to characterize incidence, prevalence, and other statistics separately for T1D and T2D-especially in China and other populations where these statistics have never been measured [11].