Data sources
Our analyses used population-based, insurance billing data from Quebec’s provincial public insurer, the Régie de l’assurance maladie du Québec (RAMQ). The RAMQ insures all physician and hospital services for about 96% of the Quebec population [21] and outpatient prescription drugs for approximately 36% (largely elderly and low-income residents) [22]. Our database includes 2,013,430 Montreal residents age 20 years or older who utilized health services between April 1, 2000 and March 31, 2010 (fiscal years 2000/01–2009/10).
The following data files were linked using an anonymized individual patient identifier: physician fee-for-service billings, hospital admissions, individual death records from the Quebec Statistical Institute (Institut de la Statistique du Québec), and the Quebec tumor registry (Fichier des tumeurs du Québec - FiTQ). Patients who are admitted to hospital appear in the hospital admissions data. Physician billings include services provided in both inpatient and outpatient settings. Day surgeries can appear in either the hospital admission or the physician billing data, depending on the location of the surgery and if the patient was admitted to the hospital.
Variables
Like other medical claims databases, the RAMQ data detail health care services received by patients: outpatient visits, hospital admissions, emergency department visits, day surgeries, and billable services (e.g., colonoscopies). The relevant diagnostic (ICD 9 and ICD 10), treatment [23], and procedure codes [24] are included in these data. They also contain information on individual-level demographic characteristics (age, sex, mortality) and small-area measures of socioeconomic status (SES) (Pampalon index of material deprivation [25]).
Algorithms
We created three algorithms to identify cases of CRC, based on varying source data. Algorithm 1 classified patients with at least one CRC diagnostic code in the hospitalization data as an incident case of CRC. Algorithm 2 classified patients with two diagnostic codes in the physician billing data separated by at least 30 days in a 2-year period, as an incident case of CRC. Algorithm 3 classified patients who meet the criteria under Algorithm 1 and/or 2 as an incident case. A case identified via algorithm 2 but not algorithm 1 would be an individual diagnosed and treated in outpatient settings only. The date of diagnosis was considered the date of admission (algorithm 1), the date of the first of the two diagnoses (algorithm 2), or whichever is first (algorithm 3) (Fig. 1). Relevant diagnostic codes are listed in Additional file 1. We investigated the receipt of surgical, medical, or other colorectal cancer related treatment at any point during our study period among all possible cases (see Additional file 1). Several validation studies of cancer incidence algorithms based on administrative data have demonstrated that the PPV of algorithms utilizing only hospitalization and physician billing data is relatively low [10,11,12]. Thus, in an effort to improve PPV, the integration of treatment codes in such algorithms has become common and we judged cases to be “true positives” only if the patient met both diagnostic and treatment criteria.
Statistical analyses
We considered the cases identified in the FiTQ as our reference point, and classified cases as concordant (individuals identified in both the FiTQ and by each of our algorithms) or newly captured cases (individuals identified by our algorithms but not in the FiTQ). We conducted descriptive analyses to compare results from the three algorithms and to select the best performing among them. We selected the algorithm that performed best based on maximizing concordance with the FiTQ and maximizing the number of cases ascertained.
We used two approaches to assess the performance of our algorithm. First, we compared the overall proportion of colon and rectal cancers detected by our algorithm to that documented elsewhere. Second, we compared the trends in age-adjusted incidence rates over time between the FiTQ and our algorithm. We expected that the algorithm would detect a consistently greater number of cases than the FiTQ, but that similar trends over time would indicate the algorithm was detecting true positives. Because we do not have another data source that we consider a valid “gold standard”, we did not assess the performance of our algorithm with measures such as sensitivity and specificity.
To characterize individuals with incident CRC who were not identified in the FiTQ, we compared the proportions of age, sex, socioeconomic status, disease site, and treatment received in the concordant and newly captured cases. We calculated 95 % confidence intervals (CIs) to make comparisons across groups. All statistical test were two-sided and assessed at the p < 0.05 level.
Use of the data was authorized by the Commission d’accès à l’information du Québec. The study was approved by the Université de Montréal ethics committee (Project 17–033-CERES-D).