Supplementing claims data with outpatient laboratory test results to improve confounding adjustment in effectiveness studies of lipid-lowering treatments

Background Adjusting for laboratory test results may result in better confounding control when added to administrative claims data in the study of treatment effects. However, missing values can arise through several mechanisms. Methods We studied the relationship between availability of outpatient lab test results, lab values, and patient and system characteristics in a large healthcare database using LDL, HDL, and HbA1c in a cohort of initiators of statins or Vytorin (ezetimibe & simvastatin) as examples. Results Among 703,484 patients 68% had at least one lab test performed in the 6 months before treatment. Performing an LDL test was negatively associated with several patient characteristics, including recent hospitalization (OR = 0.32, 95% CI: 0.29-0.34), MI (OR = 0.77, 95% CI: 0.69-0.85), or carotid revascularization (OR = 0.37, 95% CI: 0.25-0.53). Patient demographics, diagnoses, and procedures predicted well who would have a lab test performed (AUC = 0.89 to 0.93). Among those with test results available claims data explained only 14% of variation. Conclusions In a claims database linked with outpatient lab test results, we found that lab tests are performed selectively corresponding to current treatment guidelines. Poor ability to predict lab values and the high proportion of missingness reduces the added value of lab tests for effectiveness research in this setting.


Background
Administrative health insurance claims databases provide comprehensive and longitudinal records of encounters with the health care system and of drug dispensing, but lack clinical detail. For example, while the performance of a lab test will generate a claim, the test result will not be available within the claims database. This shortcoming can be overcome by merging outpatient laboratory test results extracted from electronic medical records (EMR) systems with claims data. Adjusting for lab results may result in better confounding control when administrative claims data are used to study treatment effects of medical products.
A difficulty arises from the way in which lab tests are ordered and performed in the American health care system. EMR systems with outpatient lab results generally rely on major laboratory companies to supply lab results data; results for patients whose tests are conducted outside the large chains may go unrecorded in EMRs. In conducting comparative effectiveness research in claims data, pharmacoepidemiologists generally interpret the absence of a claim as the absence of that service/diagnosis, which can result in a covariate misclassification problem but not a missing data problem [1]. However, missing lab test results do not mean the test was not performed or the results are normal, and thus must be handled like other missing data. There is little guidance in the literature on the nature of the missingness of such laboratory information or whether missing lab test results can be adequately imputed.
Investigators have employed varying strategies to deal with this situation. One approach is to identify a subcohort of patients with complete information on the lab test results of interest. Seeger et al. studied the effectiveness of statin therapy to reduce myocardial infarction rates, by requiring all patients studied to have a recorded LDL > 130 mg/dl [2]. This approach reduces the proportion of subjects with missing data, but that advantage comes at the cost of fewer subjects in the study, and a final study population that may be dissimilar to the broader patient population for important characteristics. Furthermore, this approach is impractical for multiple unrelated lab test results, as complete cases may be few [3].
One unfavorable approach is to include an indicator term for missing lab test data. In the case of lab results, missingness can imply three things: (1) the physician did not see a need to have a test ordered; (2) the patient did not choose to have the test performed, or (3) the test was performed but not in a facility whose data fed back to the patient's EMR (Figure 1). Though the implication of each of these cases is entirely different, they are indistinguishable in the data; as such, coding them simply as "missing" would lead to bias. Even if all the missingness were due to the third case, in which data are most plausibly missing completely at random, the use of a missing indicator term could still cause bias [4,5].
In order to further meaningful comparative effectiveness research, we must understand the selectiveness of missing lab test results and how missingness may be related to study outcomes. In this paper, we seek to describe the analytic issues encountered when lab results may not be available for many patients. As an example, we describe the characteristics associated with the absence of laboratory test results and the degree to which missingness and actual lab test values can be predicted based on patient and health plan characteristics in a population of patients initiating lipid-lowering therapy.

Methods
Database studies that combine claims and lab test results or other data from EMRs typically employ the claims as a "data backbone," as claims data provide a longitudinal Figure 1 Reasons for missing lab test results in a longitudinal healthcare utilization database linked to a lab test provider database*. * In the setting of a new user cohort study with a defined covariate assessment period before the first exposure and before follow-up.
view of virtually all health care encounters and drug dispensings submitted for health insurance reimbursement. Increasingly, claims databases link data from large national lab test chains [6][7][8]. Though the chains service a large number of American patients, the resulting linked data may cover substantially less than 50% of outpatient lab tests, with coverage highly dependent on the region where the patient resides and the lab companies servicing that region. Figure 1 illustrates two levels of missingness that may arise in such situations. No claim will be recorded (Level 1) if a physician does not order a test, a patient receives a lab in a hospital, or a patient does not get a test that was ordered. The result of a test that was performed may not be transmitted to the patient's claims data (Level 2) if the insurer has not established a data exchange agreement with the laboratory provider. The likelihood of Level 2 missingness increases if there is no laboratory provider operating in the area that has a data exchange agreement with the insurer.

Data sources
We employed longitudinal claims data from 14 Blue Cross and/or Blue Shield-licensed health plans of Wellpoint across 14 US states, as represented in the HealthCore Integrated Research Database SM (HIRD SM ). HealthCore linked claims information to lab test results provided by two large national laboratory providers, for laboratory tests performed between January 1, 2005 through June 30, 2010 on patients represented in the HIRD system. The claims data contained information on drug dispensings, outpatient medical services, and hospitalizations including emergency room visits. All medical services were coded with up to 9 discharge diagnoses [1]. Individual laboratory test results were identified by LOINC codes and standardized across lab providers. This study was approved by the Brigham and Women's Hospital Institutional Review Board and signed data use agreements were in place.

Study cohort and exposure
From the data available, we established a cohort of all incident users of any statin (simvastatin, pravastatin, lovastatin, atorvastatin, rosuvastatin), Vytorin (simvastatin plus ezetimibe), or ezetimibe who were 18 years or older at the start of treatment. Incident use was established by requiring at least 12 months of insurance coverage before treatment and no use of any lipid-lowering therapy in those 12 months. 1 All covariate information was assessed in the longitudinal healthcare claims over a covariate assessment period (CAP) starting 6 months before treatment initiation and up to the day of dispensing of the index drug. Follow-up for occurrence of MI started 1 month after initiation of lipid-lowering treatment, a conservative assumption to allow for the biologic action of the medication to occur (Figure 2) [9]. We categorized each medication on the index date into high and low intensity treatment based on its ability to lower LDL levels (Table 1) [10].
We defined two subgroups of patients with chronic conditions. Rheumatoid arthritis (RA) was defined as at least two outpatient diagnoses of RA in the CAP or one hospital discharge diagnosis of RA in CAP or one diagnosis of RA plus dispensing of a disease modifying antirheumatic drug. Diabetes (DM) was defined as at least two outpatient diagnoses of DM in the CAP or one hospital discharge diagnosis of DM in CAP or one diagnosis of DM plus an insulin or oral antidiabetic dispensing. Patients with rheumatoid arthritis and diabetes were identified as subgroups with chronic conditions because these patients were likely to receive more lab tests at regular intervals than the typical patient initiating a statin. Patients with rheumatoid arthritis are of further interest in that they may receive care primarily from a specialist rather than an internist and therefore, may have different patterns of laboratory use.

Patient characteristics and lab test results
Patient characteristics and potential confounders assessed during the 6-month CAP included age (18- [11] hospitalization in the 30 days prior to treatment initiation, hospitalization for more than 30 days before treatment initiation, number of days hospitalized, number of outpatient lab test ordered, hypercholesterolemia, hypertension, heart failure, myocardial infarction, coronary revascularization, peripheral vascular disease, peripheral Figure 2 Incident user cohort study*. * The 6-month covariate assessment period (CAP) precedes the initiation of treatment. During the CAP we identified patient characteristics, including lab tests performed and lab test results available. After treatment start followed a 1-month lag period before events were attributed to the treatment. The arrows between prescriptions (Rx), diagnoses (Dx) and lab tests denote the fact that the temporality of events within the CAP was not considered in this study. arterial revascularization, TIA/stroke, carotid revascularization, pre-diabetes, diabetes, arthritis, COPD, oxygen canister use, and obesity. Clinical covariates were assessed based on the presence of ICD-9 diagnosis codes (see Additional file 1: Appendix Table S1) in administrative claims during the CAP. In this exploratory analysis, we included a wide range of clinical covariates frequently measured in claims-based studies.
Within the 6 months covariate assessment period we identified all recorded outpatient lab test results for 23 commonly-performed lab tests, including lipid tests, Hb A1c , and others (see Additional file 1: Appendix Table S2). Additionally, we used CPT-4 codes to identify all labs for which charges were claimed during the CAP. We chose to include 23 lab tests to increase the probability that patients would have multiple lab tests performed and that we would be able to asses whether lab values were missing at the patient level or the test level.
In comparative effectiveness research, as in other areas of clinical epidemiology, missing data are both common and problematic. Imputation of missing values may increase precision and validity of effect estimates. The imputation literature recommends including not only preexposure patient characteristics and treatment information in the prediction of missing values but also information on the outcome status [12]. In our example, outcomes of interest were the incidence of myocardial infarction (assessed with a positive predictive value of 94%); [13] hospitalization for acute coronary syndrome (ACS) that included a coronary revascularization procedure; stroke; and death attributed to any cause (see Additional file 1: Appendix Table S3). Follow-up time started 1 month after initiation of a cholesterol-lowering drug ( Figure 2). Patients were censored at the time of discontinuation of the index drug, any of the outcomes, disenrollment, or study end (June 30, 2010), whichever came first.

Analysis
In this analysis, ascertainment of performing a lab test refers only to tests performed in the outpatient setting.
We determined the proportion of patients who had at least one such lab test performed out of the 23 study lab tests and then focused on 3 specific cardiovascular risk markers: LDL, HDL, and Hb A1c [14]. In sensitivity analyses, we extended the 6-month covariate assessment period to 9 and to 12 months in an effort to capture more lab test results.
In order to quantify differential lab test performance and result availability, we computed the number of lab tests performed (as measured by the presence of CPT-4 codes) and the proportion of those with test results available in the linked database. We then cross-tabulated these data with patient and system characteristics.
For each of the LDL, HDL, and Hb A1c cardiovascular disease risk markers, any factors associated with a completed test were identified in a multivariate logistic regression that predicted whether the outpatient lab test was performed, as a function of the patient and system characteristics described above plus statin/Vytorin exposure and cardiovascular outcome status. We then determined overall sensitivity and specificity for the predicted probabilities of test performance and model c-statistics.
In order to explore the performance of imputation strategies, we fit linear regression models for the patients who had lab test results available, in order to predict the actual LDL, HDL, and Hb A1c . In instances where patients received multiple tests, we used the value from the last test. We assumed normal distributions of test results as reasonable approximations, although data were slightly skewed. We express the proportion of explained variation as the observed R 2 from the linear regression models.
Lastly, we investigated the relationship between the completion of a lab test, the availability of test results in our database, and whether the test results themselves differed between study exposure groups stratified by RA and diabetes.

Results
Over the study period we identified 703,484 patients who met the study eligibility criteria and initiated lipidlowering therapy with statins, ezetimibe, or a combination of both. Among those, 68% had a recorded charge for at least one of the 23 study lab tests in the 6 months before treatment (Table 2). This proportion increased to 72% if the covariate assessment period was extended to 9 months before treatment, and to 74% during a 12-month period. For patients with diabetes or RA the proportions were higher (80% during 6 months) but showed equally small increases if the covariate assessment period was extended (83% and 84%). For LDL and Hb A1c tests, the proportion of patients with a recorded charge for at least one test during the 6 months before initiation of lipid-lowering therapy was about 60% and 17%, respectively. For patients with diagnosed diabetes, 68% had a charge for an Hb A1c test ( Table 2). Overall and regardless of having a test performed, the proportion of patients with any outpatient lab test results available in the linked database was about 30%, which was similar in patients with diabetes or RA. Lab test results for LDL or HDL were available for about 20% of patients during the 6 months before initiation of lipid-lowering therapy. Table 3 shows whether any of 23 outpatient lab tests, including LDL, HDL and Hb A1c were performed within the 6 months before initiating lipid-lowering therapy crosstabulated by a range of patient and health system characteristics. Overall, 481,133 (68%) of study patients had claims evidence of an outpatient lab test and 42% thereof had results available in the study data (29% of all patients). The proportion with at least one lab test performed varied substantially by patient characteristics, while test result availability varied little, and only for variables such as system characteristics and state of residence (Table 3).
Having been hospitalized in the 30 days before the initiation of lipid-lowering treatment was negatively associated with receiving an outpatient test, likely because the relevant lab tests were performed during the hospitalization and as such do not appear as outpatient lab tests. Some patients hospitalized for acute coronary syndrome or MI may have received lipid-lowering therapy for secondary prevention without the need for a lab test. This is supported by the fact that patients with both recent MI and ACS had a lower than average proportion with at least one test performed (24% and 40% compared with an average of 68%). A code for hypercholesterolemia is frequently accompanied by an LDL test performed (80%) likely because the test ordering is accompanied with such a billing code.
Patients with Medicare Supplemental coverage (43,645) had a much lower proportion of claims for LDL tests performed (18%), and of those only 2% had results available. The lab test provider may not have included the secondary payer on the claim.
The two lab test providers that provided data to the insurer do not operate in some states; for example, the availability of lab test results in one state was as low as 2% for LDL. Such low recording would not be dependent directly on patient characteristics as it affects an entire state and is driven by factors other than health status, though clinically relevant patient characteristics have varying prevalences across states.
Some patients resided in states not primarily covered by the health plan studied, and are covered only via accounts for nationally operating businesses (e.g., if the employer is based in another state, all employees may be members of a health plan in that other state, rather than the state of residence). For these patients, the availability of LDL test results is less than 10%. One state (#12) stands out as having a small proportion of patients with an outpatient LDL test performed (24%), but a much larger proportion of patients have a result available (51%). In this state, a larger proportion of providers are under HMO capitation agreements. Within these plans, underrecording of tests performed may be the result of bundled payment arrangements; however, results are still forwarded by the lab test providers resulting in the paradox of having more lab test results available in our database than performed as recorded in claims data. Among patients with Diabetes or RA, we found fundamentally similar results. Among elderly patients lab test results were more likely to be available among Medicare advantage enrollees than those patients covered through Medicare supplemental insurance (Table 4).
Based on patient and system characteristics plus exposure and outcome status it was possible to predict with high sensitivity (97%) and specificity (94%) whether outpatient lab tests were performed in the 6 months before treatment initiation. The corresponding model c-statistics of the logistic regression models were between 0.89 and 0.93 (Figure 3), indicating a very high predictive capacity. Strong independent associates of having an outpatient LDL test performed were a diagnosis of hypercholesterolemia or obesity, and carotid revascularization. Associates of low probability of doing LDL lab tests were recent hospitalization and being diagnosed with RA. Being older than 65 also decreased the chance of an LDL lab test, likely because of test underreporting due to bundled payments. Initiating high-intensity lipidlowering treatment and dying in the study follow-up period were correlates of not having an outpatient LDL, HDL, or HB A1c test performed. Not surprisingly, the strongest predictor of having an HB A1c test performed was a diagnosis of diabetes or pre-diabetes.
Among the patients for whom LDL, HDL, or HB A1c test levels were available, we then attempted to predict the actual lab levels based on their recorded patient and system characteristics. Using all observed factors described above, 17% of the variation could be explained (Figure 4). Young age was the strongest correlate of   increased high LDL (+20 mg/dl) and Hb A1c (+0.5%) levels, suggesting that in younger age initiation of lipidlowering therapy was more driven by lab test results, i.e. primary prevention, while in older age past coronary events and other risk factors were the triggers for statin initiation despite lower LDL levels (−17 mg/dl). Higher intensity of lipid lowering treatment generally was correlated with a lower proportion of outpatient LDL tests performed, a lower fraction of LDL test results available in the database, and lower LDL serum levels ( Table 5). For example, among high dose simvastatin initiators (>40 mg/day), 52% had an outpatient LDL test performed before treatment start (63% for lower dose simvastatin). Of those patients, 37% had a test result available (42%), and the mean LDL serum level was 135.6 mg/dl compared to 147.3 mg/dl for patients started on low-intensity simvastatin. Mean LDL levels were generally lower in patients with diabetes who initiated lipid-lowering therapy.

Discussion
We studied the characteristics of laboratory test information in a pharmacoepidemiologic research data source that enriches longitudinal claims data with outpatient lab test results data, which makes it possible to better adjust for biomarkers of cardiac risk in comparative effectiveness studies. In an example cohort study of 703,484 patients initiating various lipid-lowering therapies, 68% of patients had at least one of a set of 23 study lab tests performed in the 6 months before treatment, and 42% of those had test results available. LDL test results were available for 24% of statin initiators, a non-trivial level of missingness that needed to be addressed in order to preserve the validity and generalizability of findings. Missingness due to absence of lab tests being performed followed a complex pattern that is largely explained by hospitalization, clinical practice guidelines which differ for primary and secondary prevention of coronary heart disease, and by some health care system characteristics.
Several key points regarding these patterns arose and have implications for conducting comparative effectiveness research studies in such enriched data sources.

Operational aspects
A covariate assessment period of 6 months was sufficient to capture the majority of outpatient lab tests performed. Extending the period to 9 and 12 months, and thus extending the required pre-exposure enrollment period, provided few additional observed lab tests but may disproportionally reduce the cohort size if working with health plans that have high enrollee turnover rates.     available to the patient's primary care physician, repeating testing may not have been required for some time after discharge.

Selectiveness of lab tests performed
Patients with risk factors for cardiovascular events were less likely to have lab tests performed. Many patients with these characteristics receive lipid-lowering treatment as secondary prevention, which is initiated independent of serum lipid-levels more frequently than is primary prevention. Indeed, treatment guidelines in place since the late 1990s recommend that patients with a major cardiac event should be treated with lipid lowering medications [15,16]. In a prior study, patients who initiated high-intensity lipid-lowering treatment were less likely to have had an outpatient lab test performed [17]. Because the presence of preexisting cardiac risk factors is both a strong predictor of future events and a predictor of missing data on lipid levels, disregarding the missing information can be expected to bias findings of non-randomized comparative effectiveness research in this setting.

Selectiveness of lipid lab test results available
Among patients who had lab test result available, those who were subsequently initiated on higher-intensity lipid-lowering treatment were more likely to have lower lipid serum levels. This finding is again compatible with clinical practice and trial findings that patients with acute coronary events (who are less likely to have outpatient lipid tests available) should be treated with highintensity statins largely independent of their lipid levels [17,18]. System factors like state of residence and insurance plan type, particularly supplemental insurance, may substantially influence the availability of test results. However, since those factors are less likely to be systematically related to health outcomes it is unlikely that these will act as major confounding factors in comparative effectiveness studies. In addition to these limitations regarding lab tests, baseline clinical conditions may be under-reported through claims data in some patients, particularly when using a short ascertainment period, such as the 6-month period we used. For LDL, HDL, and HB A1c tests it is unlikely that point-of-care testing would be performed, which the lab test provider chain would not record. However, other tests, like INR, urine analyses and creatinine levels might be subject to this additional limitation.
If replicated in other patient populations and datasets, these findings have important implications for CER studies. The complexity of the nature of missingness that logically follows from clinical practice and the reality of our health care system requires the inclusion of a wide variety of patient and system characteristics in order to model the missing data structure. In our specific example the combination of primary and secondary prevention with lipid-lowering medications seems to complicate the prediction of missing values, but in the end is likely a reason why we could differentiate so well between patients who have an outpatient LDL test performed versus not ( Figure 5). Once the outpatient lab test results were available, we had moderate ability to predict the exact lipid/Hb A1c serum level. The resulting mismeasurement of imputed lab test results suggests that imputation of test results would provide only limited additional confounding control. However, estimation precision would be increased because the analyzable population would more than triple in our example study.
It is likely that the specific patterns of missingness of outpatient lab test results will vary depending on the clinical scenario, health care practice, and system constraints. It is encouraging that despite the nonrandom missingness we were able to predict quite well who would and would not receive a lab test result, which is a good starting point for addressing this issue. However, our difficulty in predicting actual lab values is a challenge to incorporating lab data through imputation or weighting approaches in comparative effectiveness research studies.

Conclusion
In a claims database linked with outpatient lab test results, we found that lab tests are performed selectively depending on patient risk factors and corresponding to current treatment guidelines. Poor ability to predict lab values and the high proportion of missingness reduces the added value of lab tests for effectiveness research in this setting.