BMC Medical Research Methodology

Background: Observational outcome studies of patients with obstructive sleep apnea (OSA) require adjustment for co-morbidity to produce valid results. The aim of this study was to evaluate whether the combination of administrative data and self-reported data provided a more complete estimate of co-morbidity among patients referred for sleep diagnostic testing.

Given the increased morbidity associated with OSA, observational studies of patients with OSA must also adjust for these co-morbid conditions to determine the independent effect of OSA on the outcomes of interest. The use of self-reported data through questionnaires or interviews is a common method of determining the presence of co-morbid conditions due to its efficiency and relative low cost. However, the reliability and accuracy of this data is questionable [15][16][17][18][19][20][21]. In addition the validity of self-reported conditions, using medical records as the gold standard, varies depending on the medical conditions in question and the target population under investigation [15][16][17][18][19][20][21].
Administrative data is another source from which to determine the presence of co-morbid conditions. While agreement between self-reported medical conditions and that obtained from administrative databases also varies [22][23][24][25][26][27], combining self-reported clinical data with that obtained from administrative data has been proposed as a method to increase the completeness and accuracy of comorbid conditions [28][29][30]. This enhanced measure of comorbidity has been undertaken and shown to provide a valid assessment for patients with coronary heart disease and those undergoing coronary artery bypass graft surgery [31][32][33]. Previous studies have assessed co-morbidity in OSA patients in the years prior to diagnosis [34,35]. However, many of these studies have relied on administrative records alone to determine co-morbidity. This source alone may result in an underestimate of co-morbidity within these populations. Given the importance of comorbidity in observational studies of OSA patients, and the limited information in the literature on studies combining data sources to measure co-morbidity, the purpose of this study was to evaluate whether the combination of administrative data and self-reported data provided a more complete estimate of co-morbidity among patients referred for sleep diagnostic testing.

Study Design
This project is part of a larger retrospective study investigating health care utilization among patients with OSA. We included all adult patients (> 18 years old) referred for sleep diagnostic testing at either a hospital location in Calgary, Alberta, or private home care facilities within the Calgary Health Region between July 2005 to August 2007. Virtually all sleep diagnostic testing for the city of Calgary and surrounding areas (population of approximately 1.3 million) is conducted in these facilities. All patients who underwent polysomnography (PSG) or ambulatory monitoring for the presence of OSA were invited to participate in the study. We excluded non-Alberta residents, patients previously diagnosed with OSA, and those referred but did not undergo diagnostic testing.

Obstructive Sleep Apnea
We used polysomnography (PSG) and ambulatory monitoring to identify OSA within participants. Although PSG is considered the 'gold standard' diagnostic test for OSA, an ambulatory monitoring device has proven to have excellent agreement, sensitivity and specificity with PSG [36]. In addition, the use of ambulatory monitoring has been validated as a clinical management tool [37,38].
We stratified patients by OSA severity, based on their sleep test results, using the respiratory disturbance index (RDI). The RDI was defined as the number of apneas and hypopneas per hour of sleep. Apnea was defined as a cessation of airflow for at least 10 seconds. Hypopnea was defined as an abnormal respiratory event lasting 10 seconds or more, with at least a 30% reduction in thorocoabdominal movement or airflow compared to baseline, and associated with at least a 4% oxygen desaturation. OSA severity categories included: no OSA (RDI <5 event/hr), mild OSA (RDI 5-14.9 events/hr), moderate OSA (RDI 15-29.9 events/hr) and severe OSA (RDI ≥ 30 events/hr). This classification system is well accepted in both clinical practice and within the medical literature [39,40]. The date of the sleep study was used to define the index date.

Determination of Co-morbidities and Clinical Characteristics from Surveys
Baseline clinical and demographic information was collected for all participants prior to sleep diagnostic testing. This included: age, sex, height, weight, body mass index (BMI), neck circumference, and postal code. Each participant also completed the Epworth Sleepiness Scale (ESS) [41], a self-administered questionnaire that provides a measure of daytime sleepiness. Co-morbidity was determined through the use of a questionnaire administered by trained personnel within the clinics, and patients were asked to self-report the presence of nine specific co-morbidities including hypertension, asthma, depression, cardiac arrhythmia, myocardial infarction, chronic obstructive pulmonary disease (COPD), diabetes, heart failure, and stroke. Patients were also required to provide a list of their current medications at the time of the survey. This study was approved by the Ethics Review Board of the University of Calgary, and patients gave written informed consent to participate in the study.

Determination of Co-morbidities from Administrative Data Sources
Using the patient's unique Provincial Health Number (PHN), the cohort was linked to two Alberta Health and Wellness administrative databases, the hospitalization discharge database, and the physician claims database. For each patient, all hospitalization and physician claims information was obtained for a two-year period prior to sleep diagnostic testing.
The hospital inpatient data source contains details regarding hospitalizations including admission date, discharge date, length of stay, 25 diagnostic codes (ICD-10), and 10 procedure codes for each admission. The physician claims registry contains information on physician services including dates and location of the visits, diagnostic codes (ICD-9-CM), provider specialty, and include the majority of residents in the province except a small proportion of special population groups (i.e. members of the Armed Forces, Royal Canadian Mounted Police (RCMP), and federal inmates -accounting for approximately 1% of the total population) [42].
Co-morbid conditions were identified within the Alberta Health and Wellness administrative databases using the International Classification of Diseases (ICD-9-CM and ICD-10) definitions for the nine specific co-morbidities. When available, validated algorithms were used to define each co-morbid condition (Table 1) [43][44][45][46][47][48]. These algorithms were further supplemented by the ICD-10 coding scheme developed by Quan et al. [49]. For co-morbidities that did not have validated algorithms (specifically COPD, depression and cardiac arrhythmia), ICD-9-CM and ICD-10 diagnostic codes were identified within the ICD-9-CM and ICD-10 manuals [50,51]. Within the administrative datasets, the condition was considered present if the algorithm defining the condition was satisfied. For example, diabetes was considered present if there were two or more separate diagnostic codes identifying diabetes within the physician claims or one or more hospitalization diagnostic codes identifying diabetes within the a two year period [44]. Co-morbidities that did not have a validated algorithm (depression, COPD and cardiac arrhythmia) were considered present if at least one diagnostic code recorded for the condition within either the physician claims data or hospitalization data was recorded within the two-year period prior to the index date. All 3 diagnostic coding fields were used within the physician claims data and all 25 diagnostic codes within inpatient hospitalization data. We used diagnostic type indicators in this data source to restrict conditions to only those present prior to admission and therefore excluded any condition that developed while staying in hospital.

Analysis
Patient characteristics were described using mean and standard deviation for normally distributed variables. In cases of highly skewed or non-normal distributions, the median and the inter-quartile range (IQR) were reported. Means and proportions were compared using analysis of variance and chi-square tests respectively. In addition, proportions of patients presenting with specific co-morbidities, identified in the questionnaire, were calculated.
To assess the agreement between self-reported co-morbidity and administrative databases, we calculated the proportion of subjects with each co-morbid condition based on: self-report only, administrative data sources only, both self-report and administrative data, and either selfreport or administrative data. To evaluate consistency between self-report and administrative data the Kappa (κ) statistic and 95% confidence intervals were calculated. The Kappa statistic is an index of the degree of agreement between two raters, and can be thought of as the chancecorrected proportional agreement; possible values range from +1 (perfect agreement) to 0 (no agreement above that expected by chance). Kappa values were defined as: < 0.40 as poor or fair agreement, 0.40-0.60 as moderate agreement, 0.61-0.80 as substantial agreement, and 0.81-1.00 as almost perfect agreement [52].
In addition, the McNemar's test of paired proportions was determined. This is a statistical procedure to compare two dependent or correlated proportions, and is a test of mar-ginal homogeneity that compares agreement between discordant pairs. A statistically significant McNemar's test would indicate a difference between the proportions compared. Finally to assess the validity of the enhanced measures of co-morbidity an analysis was also performed in which patients were stratified by severity of OSA to determine trends in the prevalence of the co-morbid conditions. All statistical analysis was conducted using STATA 10.0 software (Statacorp, College Station, Texas).  Table 2. Overall patients with severe OSA were more likely to be male, older and have a higher Epworth Sleepiness Score compared to subjects with lesser degrees of OSA. Table 3 presents the prevalence and agreement for comorbidities determined by self-report and administrative data. The most prevalent condition in both self-report and administrative data was hypertension and depression, with 35.1% and 27.0% of subjects referred for sleep testing self-reporting the presence of these conditions respectively. The proportions based on self-report and administrative algorithms differed significantly (McNemar's p value < 0.05) for all conditions except depression and COPD. There was substantial agreement between selfreport and administrative algorithms for diabetes, with a κ = 0.79. There was good agreement for hypertension (κ = 0.60), depression (κ = 0.50) and asthma (κ = 0.49). However COPD, heart failure, myocardial infarction, stroke and cardiac arrhythmia all demonstrated poor agreement. Of note, there was a large discrepancy between self-report and administrative data for the presence of cardiac arrhythmia (5.7% vs. 30.4%).

Comparison of Co-Morbidity Determined by Self-Report and Administrative Data Algorithms
When "both" self-reported and administrative measures of co-morbidity were required to define each condition, proportions for all nine conditions were much lower when compared to a definition that required "either" selfreport or administrative measure. For example, the proportion of patients with hypertension was 25.1% when "both" were used and 43.2% when "either" was used.

Co-Morbidity Measurement by OSA Severity
The prevalence of each of the nine conditions determined by self-report and administrative algorithms, stratified by   OSA severity, are presented in Table 4. Based on selfreport alone, the prevalence of hypertension, diabetes, and myocardial infarction increased as OSA severity increased. When using the administrative algorithms, a similar trend was observed for hypertension, diabetes and stroke. Table 5 depicts the "enhanced" co-morbidities based on a combination of either self-report or administrative data. The prevalence of hypertension, diabetes and myocardial infarction all increased with increasing OSA severity (p < 0.001).

Discussion
In this large cohort of patients referred for sleep testing we determined that patient self-report of nine co-morbid conditions had varying levels of agreement with that derived from administrative data. Specifically, agreement was highest for diabetes and hypertension, and lowest for cardiac arrhythmia and stroke. An enhanced measure of co-morbidity using either self-report or administrative data demonstrated face validity and clinically meaningful trends of increasing prevalence by OSA severity. These results suggest that when agreement between data sources is poor, a combination of sources should be used when defining co-morbidity in OSA patients, as use of either source alone may result in an underestimate of the prevalence of these conditions. Specifically, using "either" selfreport or administrative measure will increase the sensitivity of the estimate of co-morbidity.
We found that among patients referred for sleep testing, self-report of diabetes and hypertension had the highest agreement with administrative data derived definitions for these conditions. These findings are similar to those reported based on administrative data and survey data from an adult sample extracted from the Canadian Community Health Survey (CCHS) in Manitoba, Canada. Agreement between the two sources was highest for diabetes (κ > 0.70) and hypertension (κ > 0.50), and lowest for non-specific heart disease (κ = 0.38) [30]. Cricelli et al. also found good agreement between self-reported diabetes and hypertension and administrative data sources [25]. The consistency of self-reported and administrative data for these two conditions likely occurs because these conditions have clear objective criteria for diagnosis and require ongoing medical treatment. Agreement between self-reported measures of chronic disease and administrative data is dependent on the condition specifically [30].
We found very poor agreement between self-report and administrative data for the presence of cardiac arrhythmia and stroke. Underreporting of cardiac arrhythmia likely occurred because respondents are not aware of the diagnoses, or lack of familiarity with this medical term found on the self-report questionnaire [30]. Though cardiac arrhythmia is common in patients with OSA with prevalence values ranging from 35-48% [13,14], accurate selfreporting is more likely to occur for conditions that require frequent contacts with a health professional; cardiac arrhythmia is not one of these conditions. The enhanced definition of cardiac arrhythmia in our study is similar to the known prevalence in this population, and thus is likely to be an accurate reflection of the prevalence of this co-morbidity within the cohort (32.2%). The poor agreement between the two sources for stroke was also an interesting finding. We speculate that the discrepancies between administrative data and self-report for identifying stroke are due to the lower sensitivity of the administrative algorithm (67%), thus underestimating the true prevalence within this source. Again, the combination of either source likely provides a more accurate representation of stroke prevalence in this clinical population.
The measure of co-morbidity using the enhanced combination of data sources found that as OSA severity increased, the prevalence of hypertension, diabetes, and myocardial infarction also increased. This dose-response relationship for these specific conditions by OSA severity has been documented in previous studies [5,10,[53][54][55] and provides support for the face validity of our enhanced measures of co-morbidity.
The results of our study should be interpreted in context of the study limitations. First, for three of the conditions of interest (depression, cardiac arrhythmia, and COPD), validated administrative algorithms were unavailable. Using an algorithm of at least one physician claim or hospitalization in a two-year period may have resulted in some misclassification and an over-reporting of these conditions. Secondly, we did not have a gold standard to determine whether the enhanced measures are more valid than a single data source alone. However, the increasing prevalence of conditions by OSA severity, consistent with that in the literature, does provide evidence of face validity. Finally, our study was limited to a single geographic region (Calgary Health Region) and only included patients referred for sleep diagnostic testing. These patients likely represent those with more severe morbidity and will limit the generalizability of these results to other clinic-based sleep cohorts in North America.

Conclusion
We found that administrative data in combination with survey data has the potential to create a more complete measure of the co-morbidity among patients referred for sleep diagnostic testing, particularly when agreement between survey and administrative data is poor. Given the resources required to obtain clinical data, use of data enhancement with administrative data may be valuable to other researchers. Although, future studies are required to validate co-morbidities based on data enhancement, these results suggest that this methodology can aid in the adjustment of these coexisting conditions in observational studies in this area.