- Research article
- Open Access
- Open Peer Review
A comparison of estimators from self-controlled case series, case-crossover design, and sequence symmetry analysis for pharmacoepidemiological studies
BMC Medical Research Methodology volume 18, Article number: 4 (2018)
Despite the frequent use of self-controlled methods in pharmacoepidemiological studies, the factors that may bias the estimates from these methods have not been adequately compared in real-world settings. Here, we comparatively examined the impact of a time-varying confounder and its interactions with time-invariant confounders, time trends in exposures and events, restrictions, and misspecification of risk period durations on the estimators from three self-controlled methods. This study analyzed self-controlled case series (SCCS), case-crossover (CCO) design, and sequence symmetry analysis (SSA) using simulated and actual electronic medical records datasets.
We evaluated the performance of the three self-controlled methods in simulated cohorts for the following scenarios: 1) time-invariant confounding with interactions between the confounders, 2) time-invariant and time-varying confounding without interactions, 3) time-invariant and time-varying confounding with interactions among the confounders, 4) time trends in exposures and events, 5) restricted follow-up time based on event occurrence, and 6) patient restriction based on event history. The sensitivity of the estimators to misspecified risk period durations was also evaluated. As a case study, we applied these methods to evaluate the risk of macrolides on liver injury using electronic medical records.
In the simulation analysis, time-varying confounding produced bias in the SCCS and CCO design estimates, which aggravated in the presence of interactions between the time-invariant and time-varying confounders. The SCCS estimates were biased by time trends in both exposures and events. Erroneously short risk periods introduced bias to the CCO design estimate, whereas erroneously long risk periods introduced bias to the estimates of all three methods. Restricting the follow-up time led to severe bias in the SSA estimates. The SCCS estimates were sensitive to patient restriction. The case study showed that although macrolide use was significantly associated with increased liver injury occurrence in all methods, the value of the estimates varied.
The estimations of the three self-controlled methods depended on various underlying assumptions, and the violation of these assumptions may cause non-negligible bias in the resulting estimates. Pharmacoepidemiologists should select the appropriate self-controlled method based on how well the relevant key assumptions are satisfied with respect to the available data.
Medical databases, including medical claims data and electronic medical records, are increasingly used in pharmacoepidemiological research to assess the associations between drug exposure and adverse effects. The clear advantage of these database studies is the provision of large sample sizes enabling the evaluations of rare adverse events. On the other hand, the associations between specific drugs and adverse events may be confounded by time-invariant factors, including genetic factors, chronic medical conditions, and patient lifestyle. Because medical databases rarely contain such information, it is difficult to adjust for these time-invariant confounders.
Self-controlled methods, which are also referred to as case-only methods, are observational study designs where the case patients who experience the event of interest act as their own controls based on data from non-case periods [1,2,3]. The use of cases as their own controls minimizes the confounding effects of time-invariant risk factors. Self-controlled methods may therefore represent a powerful option for analyzing routinely collected medical databases that lack information on potentially influential confounders .
Among the existing self-controlled methods, self-controlled case series (SCCS) [4, 5], case-crossover (CCO) design [6, 7], and sequence symmetry analysis (SSA) [8, 9] are frequently employed in pharmacoepidemiological studies. Although all three methods estimate the relative risk or odds ratio of an event during exposure periods compared with non-exposure periods, they differ in their designated analytical periods. In SCCS, the relative risk during the exposure (risk) and non-exposure (control) periods are estimated. In CCO design, the odds of exposure in the time period immediately preceding the event of interest (case period) are compared with those in an earlier time period that did not result in an event (control period). In SSA, a crude sequence ratio is calculated as an effect measure of relative risk by dividing the number of subjects who had experienced the first event after the first exposure by the number of subjects who had experienced the first event before the first exposure. The typical designs of these methods are illustrated in Fig. 1.
In pharmacoepidemiological studies, the estimates of the three self-controlled methods may be biased under the following situations: the presence of short-term time-varying factors, such as the transient use of concomitant drugs or acute disease occurrence; the influence of long-term factors, such as seasonality, age effects, and time trends in drug prescriptions; restricted follow-up of patients based on event occurrence; patient restriction based on event history; and the misspecification of risk period durations (i.e., erroneously short or long). Although previous review articles have described several assumptions for appropriate estimations using these methods [1, 2, 10], few studies have statistically compared the performance of the methods using a single data source for the aforementioned situations.
In this study, we simulated several databases containing electronic medical records to compare the estimators produced by SCCS, CCO design, and SSA in the presence of a short-term time-varying confounder and its interactions with time-invariant confounders, as well as in the presence of long-term time trends in exposures and events. Additionally, we examined the effects of sequentially censoring patients at the time of each event and restricting analyses to patients who did not experience the event before the first exposure. Moreover, the sensitivity of the estimators to misspecified risk period durations of exposure was evaluated. Finally, these three methods were also compared using real-world data extracted from a university hospital’s electronic medical records database for estimating the association between macrolide use and liver injury.
Table 1 provides an overview of the three self-controlled methods used in this study. The SCCS method identifies patients who experienced an event at least once during the designated study duration, and models a Poisson distribution during the period k of patient i with expected events. This is described in Eq. (1) as follows:
where t ik represents the length of period k (as an offset term), exp(ϕ i ) the baseline (i.e., unexposed) incidence rate, and X i (k) the exposure status of period k (0 for the control period and 1 for the exposure period) of patient i . In the calculation of the likelihood function conditional on becoming a case , the individual patient effect ϕ i is cancelled out because each patient acts as their own control; therefore, the incident rate ratio (IRR) β for exposure X i (k) can be estimated without estimation of ϕ i . A previous simulation study showed that the SCCS method was less sensitive to time trends in probability for exposures and/or events, even for small sample sizes . However, the effect estimates can become biased if a prior event affects subsequent exposure probability, event rate, or both [13, 14].
In CCO design, a patient’s exposure experience during the case period is compared with his/her exposure experience during the control period, and the exposure probability π ij in the period j i (1 if case period, 0 if control period) of patient i is modelled as follows in Eq. (2):
where expit(φ i ) represents the patient i-specific exposure probability during control periods . Analogous to matched case-control studies, the conditional exposure odds ratio (OR) of case and control periods γ coincides with the conditional event OR between exposed and unexposed periods. As with the SCCS, the individual patient effect φ i is cancelled out in the conditional likelihood function. Note that asymptotically unbiased estimations require assumptions that the exposure has no carryover effects and that the confounders are independent to exposure trends [15, 16]. Moreover, bias can be introduced to the CCO design estimates if the exposure prevalence is different between the case and control periods, i.e., if an exposure time trend is present [16, 17].
The SSA method was developed to examine the symmetry in the distribution of an event before and after an exposure of interest . That initial report argued that the ratio of exposure-event sequence orders approximates the IRR in exposed and non-exposed person-time without algebraic proof . The use of source population trends in exposures or events is recommended to adjust for time trends in the occurrence of these episodes , which may alter SSA estimates [18, 20]. Although SSA has been frequently used in pharmacoepidemiological studies [1, 8, 9], the performance of this method as an IRR estimator in the pharmacoepidemiological context remains unclear. The estimation methods and underlying models are described in the “Analysis of simulated datasets” subsection.
Simulation study design
We used a simulation study setting designed to evaluate the risk of an acute event Y(t) (e.g., liver injury, or LI) associated with a transient exposure X(t) (e.g., antibiotic use) at day t = 1, …, 1800. Two time-invariant binary confounders C1 and C2 (e.g., genetic factors, chronic conditions, or sex) and one binary time-varying confounder C3(t) (e.g., brief administration of drugs or occurrence of an acute disease) were also generated.
For each simulation setting, we simulated 2000 datasets, each of which was composed of 100,000 patients who had an 1800-day observation period without drop-outs or censoring (Fig. 2). First, C1 and C2 were generated at Day 0 using a Bernoulli distribution with success probabilities of 0.05 and 0.5, respectively; their values were kept constant throughout the observation period. Time to onset of C3(t) followed an exponential distribution with a rate of 0.001, and was re-generated immediately after C3(t) = 1 for subjects who had already experienced an onset of C3(t) (this allows for more than one onset of C3(t) for each subject). We assumed that C3(t) occurred over a short period of time. Therefore, we set a fixed condition of “C3(t) = 1” during a defined duration (covariate-effect time; Et C ) that began from the onset of C3(t). If a new C3(t) was initiated before the fixed period of the previous C3(t) had ended, that fixed period was extended by the defined duration (Et C ).
The incidence of exposure X(t) was generated using a piecewise exponential distribution with rate parameters, as described in Eq. (3).
The exposure can occur multiple times; once an exposure began, the next time-to-exposure was generated using the above exponential distribution. For this study, we assumed that the exposure X(t) is the usage of antibiotics that would have a transient effect on the incidence of an LI event. A fixed condition of “X(t) = 1” was set during the defined duration (exposure-effect time; EtX = 15 days) that began from the onset of X(t). If a new exposure occurred before the fixed period of the previous X(t) had ended, that fixed period was extended by the defined duration (Et X ).
Finally, event Y(t) followed a piecewise exponential distribution with rate parameters as described in Eq. (4).
where the parameters varied according to the scenarios described below. Events can occur multiple times; once an event occurred, the next time-to-event was generated using the above exponential distribution. Note that for the data generation step in this study, the assumption holds that the occurrence of an event did not affect the probability of subsequent exposure, which is considered a key assumption for SCCS and SSA.
Simulation scenarios for time-varying confounding
The following scenarios were utilized to evaluate the effects of a time-varying confounder on the estimates from the three methods.
There is time-invariant confounding (βC1 = log(2.0), βC2 = log(3.0)) and interactions between the time-invariant confounders (βC1C2 = log(5.0)); the effect of exposure β X is varied (log(1.0), log(3.0) or log(10.0)).
There is time-invariant (βC1 = log(2.0), βC2 = log(3.0)) and time-varying confounding but no interactions; the effect of exposure β X (log(1.0), log(3.0) or log(10.0)), the effect of time-varying confounder βC3 (log(0.2), log(0.5), log(2.0) or log(5.0)), and the effect of covariate-effect time EtC (5, 10, 15, 20 or 30 days) are varied. When changing the parameters β X , βC3, or EtC, the other parameters are fixed.
In addition to time-invariant and time-varying confounding (βC1 = log(2.0), βC2 = log(3.0), βC3 = log(5.0)), there are interactions among the time-invariant and time-varying confounders; the effect of β X (log(1.0), log(3.0) or log(10.0)), the interaction effects among the time-invariant and time-varying confounders βC2C3 (log(0.2), log(0.5), log(2.0) or log(5.0)), and the effect of covariate-effect time EtC (5, 10, 15, 20 or 30 days) are varied. When changing the parameters β X , βC2C3, or EtC, the other parameters are fixed.
Simulation scenario for long-term time trends in exposures and events
For the evaluation of time trends in exposures and events, we slightly altered the data generation step. Specifically, the incidence of exposure X(t) was generated using Eq. (5).
where αTR is the effect of time trend on the onset of exposure. Event Y(t) was subsequently generated using Eq. (6).
where βTR is the effect of time trend on event occurrence. Based on this data generation process, we established the following scenario:
There are time-invariant confounding (βC1 = log(2.0), βC2 = log(3.0)) and time trends in exposures or events; the time trend of exposure αTR or events βTR (0 or log(1.001)) are varied. Table 2 summarizes the detailed parameters of Scenarios 1 to 4 in the simulation study.
Simulation scenario for restricting follow-up time based on event occurrence
In Scenarios 1 to 4, all patients had the same observation period (1800 days). However, this setting is not applicable if the observations can be censored due to patient death associated with the event of interest (e.g., myocardial infarction). The restriction of follow-up time based on event occurrence may affect the estimators of SCCS and SSA. To evaluate the effect of this condition, we probabilistically censored the observation of patients when an event occurred. Based on this data generation process, we established the following scenario:
There is time-invariant confounding (βC1 = log(2.0), βC2 = log(3.0)) but no time trends (αTR = βTR = 0); the observation of patients was sequentially censored upon event occurrence. Censoring probability (P C ) was based on the Bernoulli distribution with success probabilities of 0.3, 0.6 or 1.0.
Note that a P C value of 1.0 would mean that all subjects were censored at their first event (e.g., the event of interest is death). Therefore, SSA could not be performed when P C was set to 1.0 because no patients would experience this event before their first exposure under this setting.
Simulation scenario for patient restriction based on event history
In pharmacoepidemiological studies, analyses may be restricted to subjects who did not experience the event before their first exposure if the analyst suspects that the event alters the probabilities of subsequent exposures or events. To compare the estimates of self-controlled methods under this restriction, the following scenario was utilized:
There is time-invariant confounding (βC1 = log(2.0), βC2 = log(3.0)) but no time trends (αTR = βTR = 0); patients who experienced the first event before their first exposure were excluded from the study population.
This scenario could not be applied to SSA because the crude sequence ratio (CSR) cannot be calculated if the number of patients who had experienced the first event before their first exposure is zero.
Analysis of simulated datasets
For the SCCS, risk periods were defined as a 15-day period that began on the first day of an exposure, and the control period encompassed all other periods outside of the risk periods. The IRR for exposure β X and 95% confidence intervals (CI) were estimated using a univariable conditional Poisson regression model stratified by each case, and the risk periods and control periods were compared . In this analysis, we used SAS macro programs (element.sas, sccs.sas, and poisreg.sas) available from the Open University website, UK .
For CCO design, we defined the case period as the 15-day period beginning 14 days before the first event for each case. The control period was defined as the 15-day period beginning 29 days before the first event until 15 days before the first event. Cases who experienced the first event within 29 days from the start of follow-up were excluded from analysis. The effect of exposure β X was approximated by estimating the conditional OR for exposure onset (i.e., the first day of X(t) = 1) in a case period versus a control period and the 95% CIs using conditional logistic regression .
The SSA requires cases who had experienced an exposure and event within a period of prespecified length (e.g., 15 days), irrespective of the order of exposure and event. First, a CSR was calculated by dividing the number of patients who had experienced their first event in a 15-day period beginning on the first day of exposure (i.e., exposure ➔ event) by the number of patients who had experienced their first event in a 15-day period beginning 15 days before their first exposure (i.e., event ➔ exposure). Following a previous report , a null-effect sequence ratio (NSR) (described in Appendix A in Additional file 1) was calculated to adjust for trends in the occurrence of either exposures or events. However, this adjustment method depends on the underlying assumption that the trends in exposures and events for the background population (i.e., all subjects included in the database) are similar to those for the study population that was used for the calculation of the CSR (e.g., the way a drug is prescribed is similar between cases and non-cases). The adjusted sequence ratio (ASR) is the ratio of CSR to NSR, and the 95% CIs were calculated based on the binomial distributions conditional on the total number of post-exposure and pre-exposure events. In Appendix B in Additional file 1, we derived CSR as a partial maximum likelihood estimator of β X in a Cox model stratified by individual cases.
We calculated the bias (difference between the mean estimates and β X ), empirical standard error (SE) of point estimates, mean squared error (MSE; sum of squared bias and empirical variance), mean estimated SE, and coverage probability of CIs (proportion of replications in which the 95% CIs included the true β X value).
Simulation scenario for misspecification of the risk period duration
To evaluate the sensitivity of each estimator to misspecified risk period durations of exposure (EtX = 15 days), Scenario 4 datasets with no time trends (αTR = βTR = 0) were analyzed by specifying erroneously short or long risk period durations (5 or 25 days, respectively).
To compare the estimates obtained from the three self-controlled methods using real-world data, we also performed a case study that examined the risk of LI associated with the administration of macrolides using electronic medical records from the University of Tokyo Hospital (Tokyo, Japan). This electronic medical records database included information of patients who visited as outpatients or had been admitted to the hospital between January 2011 and December 2015 (approximately 244,000 patients). Data included patient characteristics (anonymized personal identifiers, age, and sex); the dates of outpatient visits, admission and discharge; prescribed medications; diagnoses; and laboratory test results.
The study population of this case study was composed of patients who had at least a 90-day continuous observation period. Each observation period was defined as the combined durations of hospitalization and consecutive outpatient visits (the intervals of hospitalization or outpatient visits were within 100 days). If the start date of hospitalization of a patient was earlier than 2011, we defined the observation start date as January 1, 2011. The observation end date was set as December 31, 2015 if a patient was still hospitalized after the end of 2015. Therefore, each patient could contribute multiple discontinuous observation periods. The exposure of interest was defined as the prescription of any macrolide approved for use in Japan, including erythromycin, roxithromycin, clarithromycin, azithromycin, spiramycin, and josamycin. The onset of LI was defined as 1) an increase in alanine aminotransferase or conjugated bilirubin that exceeded twice the upper limit of the normal range, or 2) a same-day combinatorial increase in aspartate aminotransferase, alkaline phosphatase, and total bilirubin, provided that one of these exceeded twice the upper limit of the normal range . Patients who experienced the exposure or event (LI onset) within 90 days after the start date of observation were excluded.
In this case study, we designated three patterns of risk periods (15-day, 30-day, and 45-day periods) that began on the first day of macrolide prescription. These durations were based on previous findings that the onset of LI generally occurs within 1 to 6 weeks of macrolide exposure [24, 25]. In SCCS, each observation period was divided into risk periods (0–14, 0–29, or 0–44 days after exposure) and control periods (all other periods). Contiguous prescriptions of any macrolide (without an interval of one day or more of non-prescription) were considered a single risk period. Therefore, even if the patients had multiple observation periods, all these periods would contribute to the SCCS analysis. Because SCCS can simultaneously address multiple exposures, the continuous administration of macrolides within 30 days were considered a single exposure. Similarly, consecutive elevations of liver enzyme that met the definition of LI onset within 7 days were considered a single event. In CCO design, we defined a case period as 14–0, 29–0, or 44–0 days before the first LI onset; the corresponding control periods were 29–15, 59–30, or 89–45 days, respectively, before the first LI onset. In SSA, CSRs were calculated by dividing the number of patients who had experienced their first LI during 0–14, 0–29, or 0–44 days after the first day of macrolide prescription by the number of patients who had experienced their first LI during 1–15, 1–30 or 1–45 days before the first day of macrolide prescription, respectively. For the calculation of NSRs, we used the dates of the first exposure and first LI onset for each patient in the electronic medical records database (Appendix A).
Data generation of the simulation study and all statistical analyses were conducted using SAS 9.4 (SAS Institute, Cary, NC).
Simulation study results
Impact of time-varying confounding
The results of simulations for Scenarios 1, 2, and 3 are presented in Tables 3, 4, and 5, respectively. Time-invariant confounders and their interactions did not introduce any bias to the point and interval estimates from all three self-controlled methods, regardless of the true IRR value (Table 3). After including transient time-varying confounding, however, we observed bias in the estimates of SCCS and CCO design. For βC3 = log(5.0) and EtC = 15 days, this bias were equivalent to approximately 10% overestimation of the IRR (Table 4). The bias increased and the coverage of 95% CI fell below the nominal level (0.95) when the EtC values increased. On the other hand, little bias was observed in SSA estimates, and the coverage of 95% CI was close to 95% for all settings of Scenario 2.
The interaction between time-varying and time-invariant confounders caused larger bias in the estimates from SCCS and CCO design (Table 5), which were equivalent to approximately 40% overestimation when interaction effect βC2C3 = log(5.0). The bias increased with increasing values of EtC and βC2C3. These findings indicated that both SCCS and CCO design could not eliminate the interaction effect partly induced by time-invariant factors, despite being able to eliminate the main effects of these factors. However, the level of bias and CI coverage probabilities of SSA were virtually unchanged with the addition of the non-null interaction βC2C3 (Table 5).
Long-term time trends in exposures and events
The results of the simulations for Scenario 4 are shown in Table 6. The parameter of log(1.001) produced moderate increases in time trend, and reached an approximate 6-fold increase of the incidence rate at Day 1800 relative to the first day. In cases of exposure or event time trends, there was no substantial bias observed for all methods. However, if there were both exposure and event time trends, the SCCS estimates demonstrated bias and undercoverage of 95% CI, whereas the CCO design and SSA estimates had negligible bias.
Restricting follow-up time based on event occurrence
Although probabilistic censoring in Scenario 5 did not lead to substantial bias in the SCCS estimates, moderate undercoverage of 95% CI was observed when P C was set to 1.0 (Table 7). Conceptually, the CCO design estimates would not be influenced by probabilistic censoring because the method only uses information from before the first event in each subject. On the other hand, the simulation for Scenario 5 showed severe bias and undercoverage of 95% CI in the SSA estimates.
Patient restriction based on event history
When patients who experienced the event of interest before their first exposure were excluded from the study population, we observed substantial bias and severe undercoverage of 95% CI in the SCCS estimates (Table 7) for Scenario 6. As in Scenario 5, CCO design was unaffected by patient restriction in Scenario 6 because only patients who had experienced an exposure either during the control or case periods (e.g. before the first event) contributed to this analysis.
Misspecification of the risk period duration
As shown in Table 8, an inadequately short risk period (5 days) led to severe bias in the CCO design estimates, but did not generate substantial bias in the SCCS and SSA estimates. While the coverage of 95% CI of SCCS and SSA maintained nominal levels (0.95), that of CCO design was close to 0%. In contrast, an excessively long risk period (25 days) produced similar bias in the estimates of all three methods (Table 8); the coverage of 95% CI for all methods were below nominal levels.
Precision of the self-controlled methods
In all the simulation scenarios, the empirical and estimated SEs of the estimates were consistently smaller in SCCS than in CCO design and SSA (Tables 3–8). As the bias in all three methods was negligible in Scenario 1 (Table 3), SSA had the highest MSE, followed by CCO design and SCCS. In Scenario 2, where SCCS produced non-negligible bias, the MSE was still smaller in SCCS than in CCO design and SSA in many settings (Table 4). However, the estimates from SCCS suffered from severe bias in Scenario 3 (Table 5). These settings resulted in a lower MSE in the SSA estimates than the other two methods, except when EtC was set at 5 days. Similarly, if there was no considerable bias in the estimates in Scenario 4 and the simulation scenario involving the misspecification of the risk period duration, the MSE was highest for SSA, followed by CCO design and SCCS (Tables 6 and 8). In Scenarios 5 and 6, the MSE was lowest for SCCS when there was no considerable bias in estimates (Table 7).
Case study results
Among the 114,783 patients in our electronic medical records database who had an observation period of at least 90 days, 7367 were prescribed macrolides and 14,915 patients experienced LI. The results of the case study are shown in Table 9. In the analyses of 15-day risk period, 942, 129 and 130 patients contributed the estimation of SCCS, CCO and SSA. For all different risk period durations (15, 30, and 45 days), the highest effect measure was obtained from CCO design (OR: 4.38, 3.88 and 3.23, respectively), followed by SCCS (IRR: 3.54, 2.65 and 2.23, respectively) and SSA (ASR: 2.61, 1.98 and 1.75, respectively). In our database, long-term time trends in both macrolide administration and LI were thought to be unlikely. Therefore, given a sufficient number of observed events, the SSA results would be the most reliable estimates if other drugs act as short-term time-varying confounders, interact with unmeasured genetic variants, or both. Notably, the longer the specified risk period duration, the more attenuated the relative risks that were estimated from all three methods. As shown in the simulation study with erroneously long risk periods, the longer risk periods may also bias these estimates toward lower (i.e., conservative) values. The objective of the case study was to illustrate the applications of the three self-controlled methods. Due to the fundamentally different underlying assumptions among these methods (Table 1), some of these assumptions (e.g. an event has no effect on the probability of subsequent exposure) may not be satisfied in the electronic medical records data.
In this study, we used large-scale simulation data to compare the effects of time-invariant and time-varying confounders, time trends in exposures and events, restricted follow-up time based on event occurrence, patient restriction based on event history, and misspecification of risk period durations on estimates among three self-controlled methods. In cases of low levels of bias in the estimates, SCCS had the lowest MSE. The estimators of CCO design were robust against censoring based on event occurrence or the exclusion of patients who experienced a pre-exposure event. The SSA estimators were less sensitive to short-term time-varying confounding, long-term time trends, and erroneously short risk periods. In contrast, the censoring of patients based on event occurrence introduced severe bias to the SSA estimates. Even in the presence of variations in the point estimates (the width of ratio-scaled differences ranged from 1.48 to 1.90) in the case study, a statistically significant increase in LI incidence was observed in the 15-, 30-, and 45-day periods after the initiation of macrolides in all three methods. The observed differences in estimates among the three methods were insufficiently large to affect decisions regarding the risk of LI due to macrolide use.
The findings that the bias in SCCS and CCO design estimates increased with the addition of the C2-C3(t) interaction in Scenario 2 emphasized the importance of time-invariant confounders in self-controlled methods (Table 4). The presence of such an interaction may violate the conditional Poisson modelling assumption of SCCS  and the conditional logistic modelling assumption of CCO design. The interaction term between C2 and C3(t) may not be cancelled out in the conditional likelihood equations of SCCS and CCO design because the distributions of C3(t) are unbalanced between the risk/case periods and the control periods. When time-varying confounders modify the effect of time-invariant factors on an outcome (e.g., when a transient infectious disease modifies the effect of genetics as a trigger of liver disease), or vice versa, the influence of these factors would be non-negligible in SCCS and CCO design.
As shown in Appendix B, the estimator from the SSA method can essentially be considered a hazard ratio estimator from a Cox model stratified at the patient level. An important condition of the proportionality of hazards between the exposed and non-exposed periods is that any covariate that affects event occurrence must be invariant during the exposed and non-exposed periods (i.e., a time-constant baseline hazard). If this condition is not met, the estimator would suffer from bias due to model misspecification (although proportionality of non-constant hazards is possible in theory, it is unlikely in practice). In our simulation study, however, these possible misspecification effects are likely to be minimal because 1) the time-varying confounder C3(t) rarely changed status (0 to 1 or 1 to 0) during the 30-day period, and 2) even if C3(t) changed status during the 30-day period, the days of C3(t) = 1 are likely to be balanced between the pre- and post-exposure periods. Since this simulation data are considered to be realistic, the SSA method is expected to be robust to the effects of similar types of unmeasured time-varying confounding. C3(t) may affect exposure during EtC days after its onset, and its effect may be nullified after EtC days had elapsed. This means that the onset days of C3(t) were uniformly distributed over EtC days before exposure, and that the days in which the effect ended were also uniformly distributed over the same period. Therefore, even if C3(t) affected exposure, the distribution of the days of C3(t) = 1 would be the same between EtC days before and after exposure onset, which in turn would nullify the bias arising from C3(t) (Tables 4 and 5). Hence, even if a time-varying covariate (e.g., an acute condition) leads to both higher exposure and event probabilities, this would simply imply that there are more periods with exposure and events. Consequently, this would not be expected to affect SSA estimates, which are essentially based on a “what comes first” comparison. Although the SSA method is robust to this type of unmeasured time-varying confounding, it may be affected when the magnitude of time-varying confounding changes gradually over time.
As shown in Table 5, the estimates of SCCS were considerably biased in the presence of time trends in both exposure and event onset. This finding was not consistent with the results of a previous simulation study that the SCCS method was relatively robust to time trends in both exposures and events . This discrepancy may be partially explained by the previous study’s use of shorter observations (500 days compared to 1800 days in our study), lower baseline incidence (2*10−7, which is 100 times smaller than our baseline rate), and lack of consideration to multiple exposures. In contrast, the estimates of CCO design and SSA in our study were less sensitive to the time trends in exposure and event onset (Table 6) because these two methods analyze short-term periods (30 days for each patient). Although previous studies showed that exposure time trends introduced bias to CCO design estimates [16, 17], the magnitude of bias would not be considerable for moderate time-trend effects (as set in this study). Moreover, since the time trends in exposures or events were constant for all patients in Scenario 3, NSR could be used to adjust for the effects of these trends on SSA estimates.
The severe overestimation of the SSA estimates in Scenario 5 may be the manifestation of selection bias due to the probabilistic censoring of some patients who would experience their first exposure after the first event (Table 7), which led to a reduction in the denominator for CSR calculation. This indicates that the SSA method may be unsuitable for situations where patients could be censored due to the event of interest (e.g. myocardial infraction or intracerebral hemorrhage). In contrast, SCCS estimators were less sensitive to patient restriction even when all patients were censored at the occurrence of their first event. However, SCCS estimators could become severely biased in Scenario 6 (Table 7) when patients who had experienced the event before their first exposure were excluded from the study population. It should be noted that these events have no effect on the probabilities of future exposures or events occurrence in this scenario. These findings suggest that among the three methods, CCO design may be suitable for analyses of events that have a possible effect on the occurrence of future exposures or events.
The specification of risk periods of exposure is crucial to all three methods, but different indications can be applied. For example, erroneously short periods introduced bias only to CCO design estimates, whereas erroneously long risk periods introduced bias to the estimates from all methods (Table 8). If erroneously short risk periods are specified in CCO design, a section of the case periods that have a higher likelihood of exposure would be incorrectly classified as control periods. Although similar misclassifications (true risk periods incorrectly defined as control periods) can also occur for SCCS, these would not cause considerable bias because the control periods are substantially longer than the risk periods. There was a lack of such misclassifications in SSA because this method compares the incidence rate between pre- and post-exposure periods. In contrast to erroneously long risk periods, there were misclassifications where control periods were incorrectly defined as risk/case periods for all methods, which led to an over-specification of the risk period (Table 8). Since the overall duration of risk periods was substantially shorter than that of control periods in SCCS, the merging of control periods with risk periods biased the estimates toward lower values. Similarly, such misclassification of risk periods also caused underestimations in CCO design and SSA because these two methods compared the incidence between two periods of the same length.
This study has several limitations. First, the scenarios used were restricted to transient exposure effects, had only a few confounders, no censoring, and no effect modification. Nevertheless, the relatively simple scenarios examined here were suitable for facilitating an understanding of the lesser-known effects of these factors on each estimation method. Second, we did not evaluate multiple definitions of the various periods used in the three methods. For example, the risk periods of SCCS can be divided into more than one interval , and multiple control periods can be set in CCO designs . However, because the observed bias for SCCS and CCO design could be explained by the asymmetrical distribution of C3(t), this bias would not be eliminated by changing the definition of the periods without further information on the covariates. Finally, because the electronic medical records used in this case study were obtained from one university hospital, the exposures and events that occurred outside of the hospital (such as at a patient’s home or other clinics) could not be recorded.
Our simulation study showed that the estimations of SCCS, CCO design and SSA depended on various underlying assumptions, and that the violation of these assumptions could lead to non-negligible bias in the resulting estimates. All three methods indicated that macrolide use was significantly associated with an increased risk of LI in the case study, despite small variations in estimates across the methods. Pharmacoepidemiologists should select the appropriate self-controlled method based on how well the relevant key assumptions are satisfied with respect to the available data. Moreover, if several self-controlled methods produce different estimates for the same data, analysts should carefully consider the presence of factors that may have introduced bias to the estimates.
Adjusted sequence ratio
Crude sequence ratio
Incidence rate ratio
Mean squared error
Null-effect sequence ratio
Self-controlled case series
Sequence symmetry analysis
Hallas J, Pottegard A. Use of self-controlled designs in pharmacoepidemiology. J Intern Med. 2014;275(6):581–9.
Maclure M, Fireman B, Nelson JC, Hua W, Shoaibi A, Paredes A, Madigan D. When should case-only designs be used for safety monitoring of medical products? Pharmacoepidemiol Drug Saf. 2012;21(Suppl 1):50–61.
Uddin MJ, Groenwold RH, Ali MS, de Boer A, Roes KC, Chowdhury MA, Klungel OH. Methods to control for unmeasured confounding in pharmacoepidemiology: an overview. Int J Clin Pharm. 2016;38(3):714–23.
Lai CL, Kuo RN, Chen HM, Chen MF, Chan KA, Lai MS. Risk of hip/femur fractures during the initiation period of alpha-adrenoceptor blocker therapy among elderly males: a self-controlled case series study. Br J Clin Pharmacol. 2015;80(5):1208–18.
Pratt NL, Ramsay EN, Kemp A, Kalisch-Ellett LM, Shakib S, Caughey GE, Ryan P, Graves S, Roughead EE. Ranibizumab and risk of hospitalisation for ischaemic stroke and myocardial infarction in patients with age-related macular degeneration: a self-controlled case-series analysis. Drug Saf. 2014;37(12):1021–7.
Campbell UB, Walker AM, Gaffney M, Petronis KR, Creanga D, Quinn S, Klein BE, Laties AM, Lewis M, Sharlip ID, et al. Acute nonarteritic anterior ischemic optic neuropathy and exposure to phosphodiesterase type 5 inhibitors. J Sex Med. 2015;12(1):139–51.
Lee CH, Wang JD, Chen PC. Health data analysis in Taiwan research G. Case-crossover design: an alternative strategy for detecting drug-induced liver injury. J Clin Epidemiol. 2012;65(5):560–7.
Takeuchi Y, Kajiyama K, Ishiguro C, Uyama Y. Atypical antipsychotics and the risk of Hyperlipidemia: a sequence symmetry analysis. Drug Saf. 2015;38(7):641–50.
Rasmussen L, Hallas J, Madsen KG, Pottegard A. Cardiovascular drugs and erectile dysfunction - a symmetry analysis. Br J Clin Pharmacol. 2015;80(5):1219–23.
Gault N, Castaneda-Sanabria J, De Rycke Y, Guillo S, Foulon S, Tubach F. Self-controlled designs in pharmacoepidemiology involving electronic healthcare databases: a systematic review. BMC Med Res Methodol. 2017;17(1):25.
Whitaker HJ, Farrington CP, Spiessens B, Musonda P. Tutorial in biostatistics: the self-controlled case series method. Stat Med. 2006;25(10):1768–97.
Musonda P, Hocine MN, Whitaker HJ, Farrington CP. Self-controlled case series analyses: small-sample performance. Comput Stat Data Anal. 2008;52(4):1942–57.
Hua W, Sun G, Dodd CN, Romio SA, Whitaker HJ, Izurieta HS, Black S, Sturkenboom MC, Davis RL, Deceuninck G, et al. A simulation study to compare three self-controlled case series approaches: correction for violation of assumption and evaluation of bias. Pharmacoepidemiol Drug Saf. 2013;22(8):819–25.
Farrington CP, Whitaker HJ. Semiparametric analysis of case series data. J R Stat Soc Ser C Appl Stat. 2006;55(5):553–94.
Maclure M. The case-crossover design: a method for studying transient effects on the risk of acute events. Am J Epidemiol. 1991;133(2):144–53.
Greenland S. Confounding and exposure trends in case-crossover and case-time-control designs. Epidemiology. 1996;7(3):231–9.
Vines SK, Farrington CP. Within-subject exposure dependency in case-crossover studies. Stat Med. 2001;20(20):3039–49.
Hallas J. Evidence of depression provoked by cardiovascular medication: a prescription sequence symmetry analysis. Epidemiology. 1996;7(5):478–84.
Pratt NL, Ilomaki J, Raymond C, Roughead EE. The performance of sequence symmetry analysis as a tool for post-market surveillance of newly marketed medicines: a simulation study. BMC Med Res Methodol. 2014;14:66.
Tsiropoulos I, Andersen M, Hallas J. Adverse events with use of antiepileptic drugs: a prescription and event symmetry analysis. Pharmacoepidemiol Drug Saf. 2009;18(6):483–91.
Whitaker HJ: The self-controlled case series method. http://statistics.open.ac.uk/sccs/index.htm. Accesseed 26 Semtember 2017.
Wang SV, Coull BA, Schwartz J, Mittleman MA, Wellenius GA. Potential for bias in case-crossover studies with shared exposures analyzed using SAS. Am J Epidemiol. 2011;174(1):118–24.
Benichou C. Criteria of drug-induced liver disorders. Report of an international consensus meeting. J Hepatol. 1990;11(2):272–6.
Sarges P, Steinberg JM, Lewis JH. Drug-induced liver injury: highlights from a review of the 2015 literature. Drug Saf. 2016;39(9):801–21.
Stine JG, Lewis JH. Hepatotoxicity of antibiotics: a review and update for the clinician. Clin Liver Dis. 2013;17(4):609–42. ix
The authors would like to thank Dr. Koji Oba and Dr. Kosuke Kashiwabara for their kindness in providing technical assistance and advice. In addition, we thank Dr. Tatsuo Hiramatsu, Department of Healthcare Information Management, The University of Tokyo Hospital, for data management of the electronic medical records database. We are also grateful to the two reviewers for their constructive comments, and to the editors who read our manuscript with great care and provided helpful suggestions. Their advice has helped to improve the accuracy and readability of our paper.
Ethics approval and consent to participate
As the simulation study was based on simulated populations that used no human subjects, we determined that this study did not require ethics committee approval.
The case study using electronic medical records data was approved by the Research Ethics Committee, Graduate School of Medicine and Faculty of Medicine, the University of Tokyo (Approval No. 10675-). Because this case study involved the secondary use of electronic medical records data that had been routinely collected as part of regular medical care in the hospital, the opt-out declaration that the records of the patients would be not used for research was also provided in accordance with current ethical standards for medical and health research involving human subjects in Japan.
This study was supported by Grants-in-Aid for ICT infrastructure establishment for clinical and medical research from The Japan Agency for Medical Research and Development (no grant number).
Availability of data and materials
The electronic medical records database used in this study is maintained by the Department of Healthcare Information Management, The University of Tokyo Hospital (Tokyo, Japan). However, restrictions apply to the availability of data, which were used under permission for this study. Accordingly, these data are not publicly available. The SAS program codes of the simulation study are available in Appendix C (Additional file 1).
Consent for publication
YT was employed by the Pharmaceuticals and Medical Devices Agency until March 2016. TS has received lecture fees from Astellas Pharma and Pfizer. YM has received consulting fees from Ono Pharmaceutical and Solasia Pharma, as well as lecture fees from Novo Nordisk Pharma. The authors declare that there are no other potential competing interests relevant to this study.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A. Calculation of the null-effect sequence ratio. Appendix B. Equivalence between the crude sequence ratio and the incidence rate ratio estimator calculated using a stratified Cox model. Appendix C. SAS program codes of the simulation study. (DOCX 67 kb)
About this article
- Medical database
- Misspecified risk period
- Self-controlled methods
- Time-invariant confounding
- Time trends
- Time-varying confounding