Selection of the study cohort
We constructed an incident user cohort [27] to examine UGI complication in RA and OA patients initiating celecoxib or a tNSAID, specifically ibuprofen or diclofenac. All individuals with a first dispensing between 1 July 2003 and 30 September 2004 of celecoxib, ibuprofen, or diclofenac were drawn from the Truven Health Analytics MarketScan® Commercial Claims and Encounters [28]. MarketScan is a longitudinal healthcare claims database which captures patient demographics, inpatient and outpatient diagnoses and procedures, and medications from a selection of large private employers, health plans, government agencies and other public organizations. We selected patients who were age 18–65 years, belonged to a health insurance plan with full medical and pharmacy benefits, and had at least 6 months of enrollment history as of the date of first dispensing of a study or referent drug (the “index date”). During the 6 months prior to the index date, patients must have had a diagnosis of RA (ICD-9 code 714.x) or OA (ICD-9 code 715.x, 721.x) but no NSAID dispensing (including aspirin); and no record of gastrointestinal ulcer disorders, gastrointestinal hemorrhage, active renal, hepatic, coagulation disorders, allergies, malignancy, esophageal or gastroduodenal ulceration.
The study outcome — UGI complication — was defined as an inpatient or outpatient diagnosis for either first peptic ulcer disease complications including perforation, an UGI hemorrhage (ICD-9 code 531.x, 532.x, 533.x, 534.x, 535.x, 578.0), or a physician service code for UGI hemorrhage (Current Procedure Terminology (CPT) code 43255 or ICD-9 procedure code 44.43). The complication must have occurred during the 60 days after initiation of the study drug. These outcome definitions have been validated for 1,762 patients in an independent hospital discharge database with a positive predictive value of 90% against medical chart review [29].
Aggregations of medications and medical diagnoses
Major U.S. administrative databases contain prescription medication information coded with non-hierarchical National Drug Codes (NDC) and generic drug names. The medications can be aggregated using the hierarchical Anatomical Therapeutic Chemical (ATC) drug classification developed by the World Health Organization (WHO) for drug utilization studies [30]. Similarly, medical diagnoses are represented by International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9) codes. ICD-9 has limited hierarchical relationships [31], but the Clinical Classification Software (CCS) developed by the Agency for Healthcare Research and Quality (AHRQ) can be used to aggregate diagnoses into clinically meaningful groupings [32].
We aggregated medications to five levels of the ATC classification [30]. This system classifies active substances into different groups based on their target organ or system and their therapeutic, pharmacological and chemical properties. Drugs are classified into fourteen main groups (1st level) with pharmacological or therapeutic subgroups (2nd level). The 3rd and 4th levels are chemical, pharmacological or therapeutic subgroups, and the 5th level is the chemical substance. Several ATC groups are subdivided into both chemical and pharmacological groups. The pharmacological group is often chosen if a new substance fits in both a chemical and pharmacological 4th level. Substances in the same 4th ATC level are not pharmacotherapeutically equivalent, as they may have different modes of action, therapeutic effects, drug interactions and adverse drug reaction profiles. New 4th levels are commonly established if at least two approved substances fit in the group. A new substance not clearly belonging to any existing group of related substances of ATC 4th level will often be placed in an X group (“other” group). There are very few new substances in each of the ATC 4th levels with the same ATC 3rd level for each ATC release. New substances belonging to different ATC 3rd levels will have different codes for “X” groups in ATC 4th level. Therefore, the mixed bag of “X” group is not an issue for the hd-PS algorithm.
We created several scenarios of code aggregation using the CCS groupings. In the base scenario, we applied the hd-PS with up to 5-digit granularity of ICD-9 for inpatient and outpatient diagnoses. Note that 3-digit ICD-9 codes are kept separate from 4- and 5-digit codes in the hd-PS despite the limited hierarchy between these levels. We transformed ICD-9 diagnoses into four-level CCS groupings via the cross-mapped ICD-9 to CCS multi-level diagnoses table [32]. There are 18, 134, 355 and 207 groupings in CCS levels 1, 2, 3 and 4, respectively. However, not all ICD-9 codes have a corresponding CCS code in all four levels. Therefore we created a “universal” CCS by using the most granular code available for each ICD-9 diagnosis code to obtain 355 groupings in CCS level 4. We separately investigated different levels of ICD-9 granularity by using the first 3- or 4-digit ICD-9 codes [16].
Sampling techniques to generate conditions of different size, outcome incidence and exposure prevalence
The full cohort consisted of 18,829 patients (7,197 prescribed celecoxib and 11,632 prescribed ibuprofen or diclofenac); 117 patients developed an UGI complication. For each aggregation scenario (including no aggregation), we created six conditions of 100 sampled cohorts. First, we varied the total size of the cohort by drawing simple random samples of 50% (condition 1) and 20% (condition 2), 100 times each, without replacement. Second, we varied the outcome incidence by retaining all noncases, but drawing 50% (condition 3) and 20% (condition 4) simple random samples without replacement, 100 times each, from the 117 cases and re-coding the unselected cases as noncases. Finally, we varied the exposure prevalence by retaining all unexposed patients, but drawing 50% (condition 5) and 20% (condition 6) simple random samples, 100 times each, without replacement, from the exposed subjects and replacing the unselected exposed subjects with the same number of randomly selected unexposed patients (Figure 1).
The hd-PS algorithm
We implemented the hd-PS algorithm with five data dimensions commonly available in automated healthcare databases: pharmacy claims; outpatient diagnoses; outpatient procedures; inpatient diagnoses; and inpatient procedures. The algorithm identifies the top 200 most prevalent covariates within each data dimension by creating binary variables for each diagnosis, procedure and medication. The prevalence of each variable depends on the granularity of the coding: the more aggregation, the higher the prevalence tends to be as subcodes are combined into aggregated codes. Each variable is assessed for 3 levels of its within-patient occurrence: (1) once; (2) sporadic (≥ median number of times); or (3) frequent (≥ 75th percentile number of times) with details described elsewhere [16]. With the default setting of 200 variables for each dimension, 3,000 indicator variables (200 × 3 levels × 5 dimensions) are then prioritized according to their potential for confounding control based on their prevalence and bivariate associations with the treatment and with the study outcome according to the Bross formula [19]. By default, the top 500 indicator variables are selected for the PS.
Statistical analysis
The hd-PS algorithm can augment the automatically-selected covariates with predefined covariates chosen by the investigator. For each condition-scenario combination, we fit 5 log binomial models: (1) an unadjusted crude model; (2) adjusted for basic covariates (age [continuous], gender, calendar year of drug initiation); (3) adjusted for basic plus extended, pre-selected covariates (hypertension, congestive heart failure, coronary artery disease, inflammatory bowel disease, prior dispensing of gastroprotective drugs, warfarin, antiplatelet drugs, and oral steroids) selected based on biological rationale in the literature [16, 18, 20, 33–35]; (4) adjusted for basic plus hd-PS selected covariates; and (5) adjusted for basic, extended, and hd-PS selected covariates. The hd-PS was adjusted for by including indicator variables for each decile of PS in the regression model.
In the base scenario, we used generic drugs and up to 5-digit granularity of ICD-9, CPT or Healthcare Common Procedure Coding System (HCPCS). We then re-fitted all models in six scenarios for aggregation of medications, eight scenarios for aggregation of diagnoses, and one scenario that combined the medication and diagnosis aggregations that appeared to perform best across the six conditions of cohort samples.
We applied hd-PS to the full study cohort to estimate the treatment effect and used it as the reference value for comparison with results from the generated cohort conditions. For the 100 samples in each of the cohort conditions, we calculated summary statistics for the estimated risk ratios (geometric mean, 25th and 75th percentiles), including: the mean percentage of covariates selected by hd-PS in the full cohort that were also selected by hd-PS in the samples; the median number of exposed and unexposed subjects; the median number of exposed and unexposed outcomes. We evaluated each aggregation scenario by estimating the amount of residual confounding, calculated as the difference in the natural logarithms of the estimated risk ratio and the natural logarithm of 0.50, representing the RCT findings [21–26]. To estimate the change in residual confounding resulting from each aggregation scenario, we calculated the proportional difference in absolute degree of estimated confounding between the scenario of interest and the base (no aggregation) scenario. For example, for the 20% exposure prevalence cohorts (condition 6), the unadjusted (confounded but otherwise presumptively unbiased) estimate is RRu = 0.97, and two confounded (but otherwise presumptively unbiased) estimates are RRc1 = 0.89 (base: basic, extended and hd-PS covariates; no aggregation) and RRc2 = 0.81 (combined diagnostic and medication aggregation). Assuming that the unconfounded (true) value is RRt = 0.50, estimated confounding in the base estimate = │ln(0.89) – ln(0.50)│ =0.577; estimated confounding in the combined aggregation estimate = │ln(0.81) – ln(0.50)│ = 0.482. Thus, the proportional difference in absolute degree of estimated confounding between the two estimates = (0.482 - 0.577)/0.577 = −16.3%. We would conclude that the combined aggregation estimate is 16.3% less confounded than the base estimate.
Because of limited data availability, and to mimic as closely as possible the intention-to-treat analyses in the trials, we used a single prescription reimbursement claim as the treatment measure. The current study was exempt by the Institutional Review Board of University of North Carolina at Chapel Hill.