Suitability of administrative claims databases for bariatric surgery research – is the glass half-full or half-empty?

Background Claims databases are generally considered inadequate for obesity research due to suboptimal capture of body mass index (BMI) measurements. This might not be true for bariatric surgery because of reimbursement requirements and changes in coding systems. We assessed the availability and validity of claims-based weight-related diagnosis codes among bariatric surgery patients. Methods We identified three nested retrospective cohorts of adult bariatric surgery patients who underwent adjusted gastric banding, Roux-en-Y gastric bypass, or sleeve gastrectomy between January 1, 2011 and June 30, 2018 using different components of OptumLabs® Data Warehouse, which contains linked de-identified claims and electronic health records (EHRs). We measured the availability of claims-based weight-related diagnosis codes in the 6-month preoperative and 1-year postoperative periods in the main cohort identified in the claims data. We created two claims-based algorithms to classify the presence of severe obesity (a commonly used cohort selection criterion) and categorize BMI (a commonly used baseline confounder or postoperative outcome). We evaluated their performance by estimating sensitivity, specificity, positive predictive value, negative predictive value, and weighted kappa in two sub-cohorts using EHR-based BMI measurements as the reference. Results Among the 29,357 eligible patients identified using claims only, 28,828 (98.2%) had preoperative weight-related diagnosis codes, either granular indicating BMI ranges or nonspecific denoting obesity status. Among the 27,407 patients with granular preoperative codes, 12,346 (45.0%) had granular codes and 9355 (34.1%) had nonspecific codes in the 1-year postoperative period. Among the 3045 patients with both preoperative claims-based diagnosis codes and EHR-based BMI measurements, the severe obesity classification algorithm had a sensitivity 100%, specificity 71%, positive predictive value 100%, and negative predictive value 78%. The BMI categorization algorithm had good validity categorizing the last available preoperative or postoperative BMI measurements (weighted kappa [95% confidence interval]: preoperative 0.78, [0.76, 0.79]; postoperative 0.84, [0.80, 0.87]). Conclusions Claims-based weight-related diagnosis codes had excellent validity before and after bariatric surgical operation but suboptimal availability after operation. Claims databases can be used for bariatric surgery studies of non-weight-related effectiveness and safety outcomes that are well-captured.


(Continued from previous page)
Conclusions: Claims-based weight-related diagnosis codes had excellent validity before and after bariatric surgical operation but suboptimal availability after operation. Claims databases can be used for bariatric surgery studies of nonweight-related effectiveness and safety outcomes that are well-captured.
Keywords: Bariatric surgery, Body mass index, Healthcare administrative claims, Predictive value of tests, Sensitivity and specificity, Validation study Background Bariatric surgery is the most effective treatment for severe obesity, a risk factor for many health conditions including cardiovascular diseases and death [1]. Patients who undergo bariatric surgery can achieve effective weight loss and remission of many comorbidities [2,3]. However, between 2011 and 2018, only 1% of adults with severe obesity in the United States received bariatric surgery in a given year [4,5]. With the persistent increase in the prevalence of obesity and considerable shift in the type of bariatric surgical operations performed over the last decade [5], it is important to evaluate the long-term comparative effectiveness and safety of different operations.
Administrative claims databases are an important realworld data source in comparative effectiveness and safety research. These databases often provide large and demographically diverse study populations at a fraction of the cost compared to other data sources [6]. Claims databases also capture most, if not all, medically attended events including hospitalizations and procedures performed. However, claims databases are generally considered inadequate for obesityrelated research due to the lack of body mass index (BMI) measurements and the underuse and poor validity of weightrelated diagnosis codes [7][8][9][10]. This limitation may not necessarily apply to bariatric surgery research because most health insurers in the United States require surgical facilities to receive approval to perform a given bariatric operation (a.k.a., "prior authorization"). This process involves documentation of eligibility, including having a BMI measurement ≥40 kg/m 2 , or a BMI measurement ≥35 kg/m 2 with at least 1 obesity-related co-morbidity, which are typically converted into diagnosis codes in the patient's medical record and reimbursement claims [11][12][13]. In addition, the specific International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) weight-related diagnosis codes denoting BMI ranges became available in 2006, with a subset of diagnosis codes indicating BMI ≥40 kg/m 2 becoming effective in January 2011. The more granular ICD-10-CM codes became effective in October 2015. These coding changes and the prior authorization requirements may considerably improve the availability and validity of weightrelated diagnosis codes in claims databases among bariatric surgery patients.
In this study, we evaluated the availability and validity of weight-related diagnosis codes before and after bariatric surgical operations in a large claims database linked to an electronic health record (EHR) database with actual BMI measurements.

Data source
This study used data from the OptumLabs® Data Warehouse (OLDW), which contains linked de-identified administrative claims data for commercially insured and Medicare Advantage enrollees, and de-identified EHR data that has been normalized and standardized into a single database. As of May 2019, the database contains longitudinal health information on over 200 million lives, 137 million in claims, 88 million in the EHR, and 26 million in the linked component since 2007 from a diverse mixture of ages, ethnicities, and geographical regions across the United States [14]. The claims data component includes physician, pharmacy, and facility claims submitted for reimbursement for covered members. Both paid and denied claims are included in the database and analysis, except for pharmacy claims where only paid claims are included in the analysis. The EHR component includes clinical diagnoses, procedures, prescriptions, clinical notes, laboratory results, and vital signs (including BMI) recorded as part of routine clinical practice.

Study populations
We created 3 nested study cohorts using different components of OLDW to evaluate the availability (Cohort 1) and validity (Cohorts 2 and 3) of claims-based weightrelated diagnosis codes before and after the bariatric surgical operation (Additional file 1 eFigure 1). The study was approved by the Harvard Pilgrim Health Care institutional review board with an exemption and waiver of individual patient consent.

Cohort 1
Using the claims data, we identified a retrospective cohort of patients aged 18 years or older who underwent adjusted gastric banding (AGB), Roux-en-Y gastric bypass (RYGB), or sleeve gastrectomy (SG) between January 1, 2011 and June 30, 2018. Eligible patients had continuous health plan enrollment with medical and pharmacy benefits during the 6-month period preceding the index bariatric operation, which could occur in an inpatient or ambulatory care setting. To minimize the inclusion of patients with non-obesity indications, we excluded patients who had any major bariatric operation, revisional procedures, or gastrointestinal malignancy in the 6-month preoperative period, as well as patients who had an emergency department encounter or a diagnosis of gastrointestinal ulcers on the day of the index operation. We further excluded patients who had multiple conflicting bariatric operation procedure codes on the day of index operation. The cohort was identified using ICD-9-CM (prior to October 1, 2015) and ICD-10-CM (on or after October 1, 2015) diagnosis and procedure codes; Current Procedural Terminology, Fourth Edition (CPT-4®); and the Healthcare Common Procedure Coding System. We used this cohort to evaluate the availability of claims-based weight-related diagnosis codes before and after the bariatric operation.

Cohorts 2 and 3
Cohort 2 consisted of the subset of patients in Cohort 1 who had ≥1 preoperative claims-based weight-related diagnosis code with the last available code being granular (e.g., V85.30 or Z68.30 indicating BMI between 30.0-30.9 kg/m 2 ) and ≥ 1 EHR-based BMI measurement recorded ±30 days of the granular code during the 6month preoperative period (including the index operation day). We used this cohort to evaluate the performance of our claims-based severe obesity and BMI categorization algorithms (defined below) in the preoperative period. Cohort 3 consisted of the subset of patients in Cohort 2 whose last available claims-based postoperative weight-related diagnosis was a granular code with ≥1 EHR-based BMI measurement recorded ± 30 days of this diagnosis code during the 1-year postoperative period. We used Cohort 3 to evaluate the performance of our claims-based algorithms in the postoperative period.

Development of claims-based algorithms for severe obesity and BMI categorization
We created 2 claims-based algorithms using weightrelated diagnosis codes (Additional file 1 eTable 1): a severe obesity classification algorithm and a BMI categorization algorithm. The severe obesity classification algorithm classified patients as having "severe obesity" if they had ≥1 claims-based weight-related diagnosis code indicating BMI ≥35 kg/m 2 any time during the 6month preoperative period. In bariatric surgery research, this algorithm can be used as an important cohort selection criterion to identify patients with severe obesity as the treatment indication.
The BMI categorization algorithm classified a patient's BMI into 1 of the 10 levels as indicated by their last available weight-related diagnosis codes separately during the 6-month preoperative and 1-year postoperative periods (BMI levels, kg/m 2 : ≤19.9, 20.0-24.9, 25.0-29.9, 30.0-34.9, 35.0-39.9, 40.0-44.9, 45.0-49.9, 50.0-59.9, 60.0-69.9, and ≥ 70.0). This algorithm can be used to measure the last available preoperative BMI, which is an important covariate for comparative effectiveness research on bariatric surgery as preoperative BMI may be associated both with operation choice and risks of many health outcomes. The algorithm can also measure the last available BMI measurement within a defined postoperative follow-up period (e.g., 1 year in this study) for weight-related outcome assessment.

Validation of claims-based algorithms for severe obesity and BMI categorization
We used the EHR-based BMI measurements recorded during an encounter to validate the claims-based algorithms. We classified patients as having severe obesity if they had ≥1 EHR-based BMI measurements ≥35 kg/m 2 any time during the 6-month preoperative period. For BMI categorization, we classified a patient's most proximate EHR-based BMI measurement recorded ±30 days of the last available claims-based diagnosis code in the 6-month preoperative period (for preoperative analyses) and the last available EHR-based BMI measurement in the 1-year postoperative period (for postoperative analyses), separately, into 1 of the 10 levels described above.

Statistical analyses
Availability and predictors of weight-related diagnosis codes during the preoperative and postoperative periods We described the presence of weight-related ICD-9-CM and ICD-10-CM diagnosis codes occurring any time in the 6-month preoperative period and the 1-year postoperative period, separately, in Cohort 1. We also performed the analysis by operation type, calendar year, and coding era (before October 1, 2015 for the ICD-9-CM era; October 1, 2015 and later for the ICD-10-CM era). We assessed factors associated with the presence of preoperative and postoperative claims-based weight-related diagnosis codes, separately, using logistic regression models. Factors selected a priori included demographic characteristics, region of residence, calendar year, coding era, type of index bariatric operation, care setting of index operation, and medical history measured in the 6month preoperative period (including the Charlson-Elixhauser comorbidity index score [15], individual comorbid conditions, and prior hospital admissions). The Charlson-Elixhauser comorbidity index score was originally developed to predict mortality risk in older patients [15]; we used the score as a proxy for general health status.

Performance of the severe obesity classification algorithm during the preoperative period
We assessed the performance of the severe obesity classification algorithm using sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) within Cohort 2. The sensitivity was calculated as the proportion of patients accurately classified as having severe obesity based on claims-based diagnosis code (i.e., true positives) among those classified as such based on their EHR-based BMI measurement. The specificity was calculated as the proportion of patients accurately classified as not having severe obesity based on claims-based diagnosis codes (i.e., true negatives) among those whose EHR-based BMI measurement indicated as such. The PPV was calculated as the proportion of true positives among patients classified as having severe obesity based on their claims-based diagnosis code. The NPV was calculated as the proportion of true negatives among patients classified as not having severe obesity based on diagnosis code.

Performance of the BMI categorization algorithm during the preoperative and postoperative periods
We evaluated the performance of the BMI categorization algorithm separately in the 6-month preoperative period using Cohort 2 and in the 1-year postoperative period using Cohort 3. In both preoperative and postoperative periods, we assessed the concordance between the last available claims-based weight-related diagnosis code and its most proximate EHR-based BMI measurement recorded ±30 days of the claims-based diagnosis code by estimating the weighted Cohen's kappa. As a variation of the Cohen's kappa, a measure of the degree of agreement, the weighted kappa assigns weights for partial agreement according to their distance from the perfect agreement [16]. The weighted kappa ranges from − 1 to 1 with negative values possible but unlikely in practice. In general, kappa values >0.75 are considered excellent, 0.45-0.75 are considered fair to good, and <0.40 are considered poor agreement [17]. In both preoperative and postoperative periods, we also estimated the sensitivity, specificity, PPV, and NPV within each level of the algorithm.

Sensitivity analyses
We examined a different severe obesity classification algorithm using BMI ≥40 kg/m 2 as the cutoff. We also varied the BMI categorization algorithm by (1) using larger BMI intervals (5-level BMI categories, kg/m 2 : ≤29.9, 30.0-39.9, 40.0-49.9, 50.0-59.9, ≥60.0; 4-level categories: underweight ≤19.9, normal 20.0-24.9, overweight 25.0-29.9, obese ≥30.0), and (2)  and assessed their performance during the preoperative and postoperative periods (Additional file 1 eTable 2). In addition, we examined the impact of the proximity restriction between the claims-based weight-related diagnosis code and the EHR-based BMI measurement on their concordance in the preoperative and postoperative periods. We also separately evaluated the performance of the BMI categorization algorithm for the last available BMI during the 6-month and 2-year postoperative periods. We performed all analyses with SAS Enterprise Guide 7.13 for Windows (SAS Institute, Cary, NC).
Cohort 2 included 3045 patients from Cohort 1 who had both claim-based weight-related diagnosis codes (with the last preoperative code being granular) and EHR-based BMI measurements in the 6-month preoperative period; 196 (6.4%) had AGB, 1251 (34.6%) had RYGB, and 1794 (58.9%) had SG. Compared to Cohort 1, the average age was slightly higher (47.6 years), and slightly more patients had hypertension (69.5%) and dyslipidemia (56.4%) in Cohort 2. On the index operation day, 77.6% had both claims-based diagnosis codes and EHR-based BMI measurements.
Cohort 3 included 511 patients from Cohort 2 who had granular last available claims-based weight-related diagnosis codes in the 1-year postoperative period with ≥1 EHR-based BMI measurement in the ±30 days of the diagnosis code, with 31 (6.1%) having AGB, 190 (37.2%) having RYGB, and 290 (56.8%) having SG. Compared to Cohorts 1 and 2, the average age was higher (48.9 years) in Cohort 3, more patients had hypertension (71.8%) and dyslipidemia (58.1%), and fewer had non-alcoholic fatty liver disease (22.5%) or diagnosis codes indicating smoking (1.8%). On average, patients had their first weight-related diagnosis code around 57 days after index operation and last available diagnosis code 159 days before the end of 1-year follow-up.
Presence of weight-related diagnosis codes 6-month preoperative period Most of the patients in Cohort 1 had ≥1 claims-based weight-related diagnosis code, with 27,407 (93.4%)  Figure 1). The granular diagnosis codes were more prevalent in the ICD-10-CM era than the ICD-9-CM era (96.8% versus 91.1%). Similar increasing trends were observed across operation types, with higher prevalence of granular diagnosis codes observed in SG patients (Additional file 1 eFigures 2 & 3).

1-year postoperative period
Among the 27,407 patients with granular weight-related diagnosis codes in the 6-month preoperative period in Cohort 1, 12,346 (45.0%) had granular codes, 9355 (34.1%) had nonspecific codes, and 5706 (20.8%) did not have any codes in the first postoperative year (Fig. 2). The distribution of diagnosis codes was similar among patients receiving different types of operation.
Factors associated with the presence of weight-related diagnosis codes 6-month preoperative period Compared to patients with claims-based weight-related diagnosis codes, those without codes were more likely to be male, Asian, older, have more hospital stays before operation, or receive the operation in an ambulatory care setting in Cohort 1 ( Table 2). Among patients who had weight-related diagnosis codes, those with granular codes (e.g., V85.30) were more likely to have SG, be covered by Medicare Advantage plans, or have the operation in an inpatient setting or recent years (Additional file 1 eTable 3).

1-year postoperative period
Compared to patients with claims-based weight-related diagnosis codes, those without codes were more likely to receive AGB, be younger, be male, be commercially insured, or lack preoperative weight-related diagnosis codes in Cohort 1 (Additional file 1 eTable 4). Among patients who had weight-related diagnosis codes in the Abbreviations: BMI body mass index, GERD gastroesophageal reflux disease, IQR interquartile range, NAFLD non-alcoholic fatty liver disease, PCOS polycystic ovarian syndrome, SD standard deviation a The last BMI before operation was obtained in the electronic health records (EHR) component of the OptumLabs Data Warehouse (OLDW) for those patients who had linkage. Patients who did not have EHR linkage in the OLDW were coded as missing postoperative year, those having granular codes were more likely to be older, be covered by Medicare Advantage plans, have comorbid conditions, receive SG, or have the operation in an inpatient setting or recent years (Additional file 1 eTable 5).

1-year postoperative period
In Cohort 3, the BMI categorization algorithm had a weighted kappa of 0.84 (95% confidence interval 0.80, 0.87). The specificity and NPV were high for all BMI levels while the sensitivity was above 70% and the PPV was above 60% for most BMI levels (Table 3).

Sensitivity analyses
When varying the severe obesity classification algorithm to detect the presence of BMI ≥40 kg/m 2 during the 6month preoperative period, both the specificity and NPV increased (75 and 83%, respectively) while sensitivity and PPV dropped slightly (98 and 96%, respectively). Expanding the algorithms to include nonspecific weightrelated diagnosis codes (e.g., 278.01) resulted in meaningful decrease in specificity (Additional file 1 eTable 6). The 5-level BMI categorization algorithm had similar concordance compared to the 10-level categorization, while the 4-level BMI categorization algorithm had great concordance with a weighted kappa above 0.90 for both the preoperative and postoperative periods (Table 3). Expanding the algorithms to include nonspecific weight- related diagnosis codes had minimal impact on their performance (Additional file 1 eTable 7). Relaxing the proximity requirement between the timing of the claims-based weight-related diagnosis codes and the EHR-based BMI measurements increased the size of the validation sample; this did not change their concordance during the 6-month preoperative period but reduced their concordance in the 1-year postoperative period (Additional file 1 eFigure 4). The BMI categorization algorithm for the last available BMI performed well in the 6-month and 2-year postoperative periods (Additional file 1 eTable 8).

Discussion
In a large administrative claims database, we found that nearly all bariatric surgery patients had preoperative weight-related diagnosis codes, while the presence of granular weight-related diagnosis codes increased substantially in both the preoperative and postoperative periods between 2011 and 2018. The claim-based algorithm for severe obesity, which classified patients as having severe obesity if they had a diagnosis code indicating BMI ≥35 kg/m 2 , had high sensitivity and PPV but reasonable specificity and NPV. The BMI categorization algorithm that categorized weight-related diagnosis codes into BMI levels had excellent concordance with the EHR-based BMI measurement, with high specificity, PPV, and NPV across all levels and higher sensitivity among higher levels of BMI.
The persistently high prevalence of claims-based weight-related diagnosis codes, including granular and nonspecific codes, in the preoperative period across the study years reflects the high adherence to the insurance reimbursement requirement [11][12][13]. The observed higher prevalence of weight-related diagnosis codes in the ICD-10-CM era than the ICD-9-CM era is consistent with previous data that focused on the claim-based diagnosis codes in the general population [10].    The BMI categorization algorithm had different sensitivities for BMI level 30.0-34.9 kg/m 2 in the preoperative and postoperative periods (30% versus 84%). Six months before having a bariatric operation, 70% of patients with an EHR-based BMI measurement between 30.0 and 34.9 kg/m 2 had a granular weight-related diagnosis code indicating BMI ≥35 kg/m 2 . During the first postoperative year, only 15% of those with an BMI measurement between 30.0 and 34.9 kg/m 2 had a diagnosis code indicating BMI ≥35 kg/m 2 . These patients with borderline BMI levels immediately before having a bariatric operation might have undergone preoperative weight loss as required by their insurance or encouraged by their clinical programs, as half of them had 1 or more BMI measurements ≥35 kg/m 2 within the prior 30 days. These patients might also have been up-coded with a higher weight-related diagnosis code to meet the prior authorization requirement.
Claims databases for bariatric Surgery research: a glass half-full of half-empty?
The high prevalence and validity of weight-related diagnosis codes before a bariatric operation in claims databases makes it feasible to use these codes to capture a large proportion of eligible patients, especially when researchers impose additional eligibility criteria to exclude patients with non-obesity indications, like what we did in our study. In addition, the high concordance between the claims-based BMI categorization algorithm and actual BMI measurement, along with its high validity, suggests that it is possible to use these preoperative weight-related diagnosis codes for baseline confounding control.
On the other hand, despite considerable increase across years and high validity, the presence of weightrelated diagnosis codes remained low in the first postoperative year, with around 80% of patients having any codes and around 60% having granular codes in 2017 and 2018. The suboptimal presence of weight-related diagnosis codes in the postoperative period makes it more challenging to use claims databases for weightrelated effectiveness research. In addition, there could be differential coding in the postoperative period because patients with granular weight-related diagnosis codes were older and had more comorbid conditions (Additional file 1 eTable 5). These patients with granular diagnosis codes in the postoperative period may not be representative of the overall study population. For example, some of them may be preparing for a second stage operation or having inadequate weight loss from  The last available weight-related diagnosis code in claims in the 1-year postoperative period was compared with the last available BMI measurement in the EHR during the same period in Cohort 3 patients c Cells with 10 or fewer patients have been suppressed to maintain the de-identification nature of the database d Sensitivity was not calculated because no patients had relevant BMI measurement at this level in the EHR their index operation. It is thus important to weigh the internal validity and generalizability when using the postoperative weight-related diagnosis codes for weightrelated effectiveness outcome research. In situations when all relevant factors contributing to the presence of postoperative granular diagnosis codes are measured, results from patients with granular codes could be generalized to the overall study population using appropriate statistical approaches, such as inverse probability weighting [18]. Taken together, our findings support the use of administrative claims data for bariatric surgery research of non-weight-related outcomes that are generally wellcaptured, such as rehospitalization, reoperation, venous thromboembolism, or remission of certain comorbidities including type 2 diabetes [19][20][21][22].

Strengths and limitations
This study used contemporary data from a large administrative claim database linked with EHR to validate two claims-based weight-related algorithms. Prior studies focused on either claims-based algorithms in the general population [8,10] or the broad four-level obesity classification algorithm for bariatric surgery patients in the preoperative period [23]. We evaluated the validity of these diagnosis codes during both the preoperative and postoperative periods, providing information for researchers who are interested in using administrative claims databases to study weightrelated effectiveness outcomes. Our findings add to the knowledge base of the quality and suitability of administrative claims data, a real-world data source, for generation of real-world evidence in bariatric surgery research [24]. One limitation of our study is the small sample size for the postoperative period resulted from the proximity requirement on the EHR-based BMI measurement, which may limit the generalizability of our results. In sensitivity analyses where we relaxed the proximity requirement, the size of the validation sample increased but no substantial change was observed in the validity of postoperative weight-related diagnosis codes. Moreover, the linked EHR data were only available on a small subset of patients identified in claims who received care at healthcare service systems that contribute EHR data to OLDW, raising the possibility of unmeasured factors affecting our analyses and limiting the generalizability of our results.

Conclusions
Among bariatric surgery patients identified within administrative claims databases, the validity of weightrelated diagnosis codes was excellent during the preoperative and postoperative periods. These findings support the use of administrative claims databases for bariatric surgery research in the absence of BMI measurements for non-weight-related effectiveness and safety outcomes that are generally well-captured in these databases. However, the availability of weight-related diagnosis codes was suboptimal during the postoperative period, making it more challenging to use claims databases for weight-related effectiveness research.