- Research
- Open access
- Published:

# A novel weighting method to remove bias from within-subject exposure dependency in case-crossover studies

*BMC Medical Research Methodology*
**volume 21**, Article number: 214 (2021)

## Abstract

### Background

Case-crossover studies have been widely used in various fields including pharmacoepidemiology. Vines and Farrington indicated in 2001 that when within-subject exposure dependency exists, conditional logistic regression can be biased. However, this bias has not been well studied.

### Methods

We have extended findings by Vines and Farrington to develop a weighting method for the case-crossover study which removes bias from within-subject exposure dependency. Our method calculates the exposure probability at the case period in the case-crossover study which is used to weight the likelihood formulae presented by Greenland in 1999. We simulated data for the population with a disease where most patients receive a cyclic treatment pattern with within-subject exposure dependency but no time trends while some patients stop and start treatment. Finally, the method was applied to real-world data from Japan to study the association between celecoxib and peripheral edema and to study the association between selective serotonin reuptake inhibitor (SSRI) and hip fracture in Australia.

### Results

When the simulated rate ratio of the outcome was 4.0 in a case-crossover study with no time-varying confounder, the proposed weighting method and the Mantel-Haenszel odds ratio reproduced the true rate ratio. When a time-varying confounder existed, the Mantel-Haenszel method was biased but the weighting method was not. When more than one control period was used, standard conditional logistic regression was biased either with or without time-varying confounding and the bias increased (up to 8.7) when the study period was extended. In real-world analysis with a binary exposure variable in Japan and Australia, the point estimate of the odds ratio (around 2.5 for the association between celecoxib and peripheral edema and around 1.6 between SSRI and hip fracture) by our weighting method was equal to the Mantel-Haenszel odds ratio and stable compared with standard conditional logistic regression.

### Conclusion

Case-crossover studies may be biased from within-subject exposure dependency, even without exposure time trends. This bias can be identified by comparing the odds ratio by the Mantel-Haenszel method and that by standard conditional logistic regression. We recommend using our proposed method which removes bias from within-subject exposure dependency and can account for time-varying confounders.

## Background

The case-crossover design has been widely used since it was proposed in 1991 [1]. The design has been used in various fields including pharmacoepidemiology [2], occupational epidemiology [3], studies on traffic safety [4] and air pollution health effects [5, 6]. In case-crossover studies, individuals who have experienced the outcome (cases) act as their own controls by including one or more periods before the onset of the outcome. The period including the outcome is the case period, while period(s) prior to the case period act as the controls. The number of control periods can be large: for example, in the original article [1], one analysis involved 8766 control periods. The effect of the exposure should be brief; exposure in any period should affect the outcome in that period only, without any ‘carryover effect’ [1, 6, 7]. In addition, in case-crossover studies, the rate of outcome occurrence is usually assumed to be unchanged during exposed or unexposed periods, respectively. However, like other case-only studies, the case-crossover study has an advantage that the effect of time-invariant confounders is automatically controlled because the case period is compared with control period(s) of the same individual [7,8,9]. The case-crossover study also has unique characteristics. For example, the original unidirectional case-crossover study does not include periods after the outcome occurs. Thus, there is no bias due to the outcome influencing future exposures or future observation periods, unlike other case-only studies such as self-controlled case series [10,11,12].

However, case-crossover studies are susceptible to at least two types of major biases. The first is bias due to time trends in the exposure which can be removed using a variant of case-crossover studies, the case-time-control design [9, 13, 14]. The second is bias due to within-subject exposure dependency or autocorrelation in an individual’s exposure history [6, 15, 16]. This bias is particularly important in pharmacoepidemiology because drug use on 1 day is rarely independent from use in the preceding days. However, the potential for this bias has had little attention in the pharmacoepidemiology literature, and standard conditional logistic regression has been used without assessing whether bias due to within-subject exposure dependency exists in many case-crossover studies [17,18,19,20,21]. This may be in part because this bias has been considered rather minor when exposure is stationary [15].

In this paper, we show that within-subject exposure dependency may produce unignorably large bias even if exposure is stationary. We also describe how to remove bias from within-subject exposure dependency, when time trends in the exposure do not exist.

### Motivating examples

Our motivation for investigating bias in case-crossover studies due to within-subject exposure dependency was a hypothetical drug treatment with a specific cyclic exposure pattern, where standard conditional logistic regression may produce bias. We also outline real-world pharmacoepidemiological studies in Japan and Australia where our proposed method is used.

### Drug treatment with specific exposure pattern

We illustrate our motivating example using a drug treatment with a cyclic pattern consisting of two periods of treatment followed by one period with no treatment. The exposure pattern clearly shows autocorrelation, with the probability of exposure in one treatment period dependent on previous periods. We show that estimates of the odds ratio for the exposure-outcome association may be biased, depending on the number of control periods chosen.

In a hypothetical unidirectional case-crossover study with 4 periods (one case period and 3 control periods), only 3 exposure patterns are possible. If exposure and non-exposure are indicated by 1 and 0, respectively, the 3 exposure patterns are (1101), (1011), and (0110). Therefore, subjects with an exposed case period can have only 1 unexposed and 3 exposed periods, while those with unexposed case period can have only 2 unexposed and 2 exposed periods. In such a case-crossover study, when the true odds ratio (OR) is 4.0, the OR is underestimated as 3.2 using standard conditional logistic regression for matched case-control studies.

However, if there are two control periods, the possible exposure patterns are (110), (101), and (011). In this case, irrespective of whether the case period is exposed or unexposed, all subjects have 1 unexposed and 2 exposed periods and there is no bias in the estimate of the OR. Similarly, there is no bias when the total number of periods (including the case period) is an integral number of 3 (the cyclic pattern).

### Overview of real-world examples

We describe two real world pharmacoepidemiological studies which will be analyzed by our proposed method. The first is an investigation into a previously reported association between celecoxib and peripheral edema [22] using Japanese data from 25 corporate-type health insurance plans [23]. From 99,821 new users of celecoxib aged between 20 and 74 years from May 2013 to April 2018, who used celecoxib after at least 180 days of non-use, we selected 311 cases who experienced peripheral edema and had both exposed and unexposed days during an 84-day study period.

The second example is an Australian study of the association between hip fracture and psychoactive medicines [24, 25]. The data were obtained from the Australian Department of Veterans’ Affairs administrative claims database. Psychoactive medicines included benzodiazepines, selective serotonin re-uptake inhibitors (SSRIs), opioids, antipsychotics and tricyclic antidepressants. The hip fracture cases were 8828 patients aged over 65 years who were hospitalized between 2009 and 2012. A previous case-crossover study of this cohort found an increased risk of hip fracture for opioids, SSRIs and antipsychotics [24]. A related case-control study found an association between hip fracture and SSRIs when used concurrently with other psychoactive medicines [25].

## Methods

To avoid bias due to within-subject exposure dependency, the likelihood for estimating the odds ratio for the exposure-outcome association may be modified. At least two approaches are effective to modify the likelihood and both involve two-step weighting procedures. In the first approach, the probability of each exposure permutation is estimated in Step 1 and used as weights in the likelihood in Step 2. In the special case where this probability is the same for every permutation (“global exchangeability”), the modified likelihood is equivalent to that for the standard conditional logistic regression.

In this paper, we propose another approach that can be used if “pairwise exchangeability” is assumed. Pairwise exchangeability holds where there are no time trends in the exposure i.e., the proportion exposed in the population is stationary over the study period [15]. In this scenario, the probabilities that the case period is exposed and unexposed are estimated in Step 1 and used as weights in Step 2. When there is pairwise exchangeability, an unbiased OR can be obtained by the Mantel-Haenszel method as well.

### BIAS due to conditional logistic regression for case-crossover studies

In 2001, Vines and Farrington proposed the likelihood for case-crossover studies as [15]:

where *X*_{i0} is the exposure level at the case period (m = 0) and *X*_{im} is the exposure level at the m-th control period at t (*t* = − *m*: m = 1, 2, −-, M), *x*_{i0} denotes the observed exposure level at the case period and *x*_{im} denotes the observed exposure level at the m-th control period, the sum in the denominator ranges over all permutations of κ of the integers {0, 1, −--, M} and the sum in the numerator ranges over the subset of the permutations for which *x*_{iκ(0)} = *x*_{i0} of the individual i. In Eq. (1), *X*_{i0} may be a binary exposure variable but can be a multi-level exposure variable or a vector of an exposure and time-varying confounders. When *X*_{im} denotes a binary exposure, the denominator in Eq. (1) becomes \(\exp \left(\beta \right){\sum}_{\kappa_1}P\left({X}_{i0}=1,{N}_{exposed}=k\right)+{\sum}_{\kappa_0}P\left({X}_{i0}=0,\kern0.5em {N}_{exposed}=k\right)\) where *N*_{exposed} is the total number of exposed (case and control) periods, given by \({N}_{exposed}={\sum}_{m=0}^M{X}_{im}\) and \({\sum}_{\kappa_1}P\left({X}_{i0}=1,{N}_{exposed}=k\right)\) is the sum of probabilities for all the permutations of k exposed and (M + 1-k) unexposed periods where *X*_{i0} = 1, and \({\sum}_{\kappa_0}P\left({X}_{i0}=0,\kern0.5em {N}_{exposed}=k\right)\) is that where *X*_{i0} = 0. Data on *N*_{exposed} = *k* is informative only when the positivity (non-zero probability) condition is satisfied or ∑*P*(*X*_{i0} = 1, *N*_{exposed} = *k*) > 0 and ∑*P*(*X*_{i0} = 0 *N*_{exposed} = *k*) > 0. Otherwise, they do not contribute to the estimation of exp(*β*).

Let *OR*_{VF} be the estimate of exp(*β*) obtained from Eq. (1). The likelihood in Eq. (1) and *OR*_{VF} are in general different from the following likelihood for the standard conditional logistic (SCL) regression for individually matched case-control studies and its estimate, *OR*_{SCL}.

Vines and Farrington showed that the likelihoods in Eqs. (1) and (2) are equivalent if P{*X*_{i0} = *x*_{i0}, − − −, *X*_{iM} = *x*_{iM}} = *P*{*X*_{i0} = *x*_{iκ(0)}, − − −, *X*_{iM} = *x*_{iK(M)}} for all permutations κ of {0, 1, −---, M}, that is, global exchangeability holds. For example, Eqs. (1) and (2) are equivalent when the exposure status in one period is independent from the status in any other periods and the exposure probability is the same in all of case and control periods (Appendix 1, Additional File 1). If global exchangeability does not hold, *OR*_{SCL} can be biased.

Vines and Farrington did not show explicitly how to estimate *P*{*X*_{i0} = *x*_{iκ(0)}, − − −, *X*_{iM} = *x*_{iκ(M)}} in Eq. (1). These probabilities may be estimated by the proportion of each permutation in the population which contain cases, or in samples representing the population such as time-controls in the case-time-control design proposed by Suissa [14]. In the next section, however, we introduce a different approach to remove the bias from within-subject exposure dependency by assuming that pairwise exchangeability is satisfied but global exchangeability may not necessarily hold.

### Weighting method to remove BIAS due to within-exposure dependency

#### Case-crossover studies with a binary exposure

When pairwise exchangeability, P{*X*_{i0} = 1, *X*_{im} = 0} = *P*(*X*_{i0} = 0, *X*_{im} = 1} is satisfied for a binary exposure in all control periods m (m = 1, 2, −-, M), the estimate of exp(*β*) using the Mantel-Haenszel method (*OR*_{MH}) is unbiased whether within-subject exposure dependency exists or not [15]. In line with this finding, we will show that when pairwise exchangeability holds, the probabilities that the individual is unexposed (π_{0}) and exposed (π_{1}) at the case period, can be estimated from the cases in a case-crossover study (without requiring data from the population or time-controls). Once π_{0}, π_{1}, and the relative exposure probability π_{10} = π_{1} / π_{0} are estimated, the following likelihood for case-crossover studies proposed by Greenland [26] can be used to obtain an unbiased estimate of exp(*β*), defined as *OR*_{G}:

In the left-hand side of Eq. (3), *π*_{ik} is the probability that individual i has the k-th exposure level at the case period (k = 0, 1 for binary exposure), *π*_{ic} is *π*_{ik} observed, and *x*_{ic} is the exposure level observed when individual i has the outcome. In the middle of Eq. (3), subscript i is omitted in the exposure probabilities as *π*_{ik} is replaced by the expected value in the population in the current study. In the right-hand side of Eq. (3) *π*_{k0} = *π*_{k}/*π*_{0}. Eq. (3) stands for the model where *x*_{ik} is a binary variable, multi-level exposure variable, or a vector of the observed exposure and time-varying confounders.

For a binary exposure, the right-hand side of Eq. (3) can be rewritten as

where *x*_{ic} = 1 and *π*_{c0} = *π*_{10} = *π*_{1}/*π*_{0} when the individual i was exposed at the case period and *x*_{ic} = 0 and *π*_{c0} = *π*_{00} = *π*_{0}/*π*_{0} = 1 when unexposed. As Vines and Farrington showed [15], Greenland did not estimate *π*_{k} for case-crossover studies in Eq. (3) when within-subject exposure dependency exists.

We outline a novel weighting method using a modified version of Greenland’s likelihood. We propose that *π*_{k} can be estimated, with or without within-subject exposure dependency, by assuming pairwise exchangeability. Let P_{kl[m]} denote the joint probability that the subject has the k-th exposure level at the case period and has the *l*-th exposure level at the m-th control period:

In Eq. (5), *X*_{0} is the exposure status at the case period and *X*_{m} is the exposure status at the m-th control period. When the exposure variable is binary, both *X*_{0} and *X*_{m} have two levels (k = 0, 1 and *l* = 0,1) and pairwise exchangeability is equivalent to stationary exposure (no time trends): when the exposure process is stationary, P_{10[m]} + P_{11[m]} = P_{01[m]} + P_{11[m]} and this relationship leads to the pairwise exchangeability condition P_{10[m]} = P_{01[m]} (m = 1, 2, −-, M).

Using conditional probabilities, pairwise exchangeability, P{*X*_{i0} = 1, *X*_{im} = 0} = *P*(*X*_{i0} = 0, *X*_{im} = 1} can be rewritten as:

where π_{0} = P(*X*_{0} = 0) and π_{1} = P(*X*_{0} = 1). When both sides of Eq. (6) are summed up over M control periods (m = 1, 2, −-, M), we obtain:

The quantity \({\sum}_{m=1}^M\mathrm{P}\left(\ {X}_m=0|{X}_0=1\right)\) can be estimated by the average number of unexposed control periods (where *X*_{m} = 0) in those exposed at the case period (*X*_{0} = 1). Similarly, the quantity \({\sum}_{m=1}^M\mathrm{P}\left(\ {X}_m=1|{X}_0=0\right)\) can be estimated as the average number of exposed control periods (*X*_{m} = 1) in those unexposed at the case period (*X*_{0} = 0). This average, defined as \(\overline{PT_{10}}\) and \(\overline{PT_{01}}\), respectively, can be written as:

where *PT*_{10i} is the number of unexposed control periods (person-time) of case i who is exposed at the case period, *PT*_{01i} is the number of exposed control periods (person-time) of case i who is unexposed at the case period, and *a*_{1} is the number of discordant exposed cases with at least one unexposed control period and *a*_{0} is the number of discordant unexposed cases with at least one exposed control period. When \({\sum}_{m=1}^M\mathrm{P}\left(\ {X}_m=0|{X}_0=1\right)\) and \({\sum}_{m=1}^M\mathrm{P}\left(\ {X}_m=1|{X}_0=0\right)\) in Eq. (7) are substituted by \(\overline{PT_{10}}\) and \(\overline{PT_{01}}\), respectively, we obtain:

When standard statistical software is used for conditional logistic regression analysis of case-crossover studies, we introduce weighting to ensure the denominator in Eq. (2) equals that in Eq. (4). Every exposed and unexposed period in case i should be weighted by w_{i1} and by w_{i0}, respectively, defined as follows:

where \({m}_i^1\) is the number of exposed periods and \({m}_i^0\) is the number of the unexposed periods (including both case and control periods) in case i and *π*_{10} is estimated from Eq. (9). In most statistical software, w_{ik} (k = 0,1) may be specified as an offset variable in conditional logistic regression which uses the following likelihood:

where w_{ij} = w_{i1} and w_{ij} = w_{i0} when j-th period is exposed and unexposed, respectively, in case i.

Using *a*_{1} and *a*_{0}, Eq. (4) can be rewritten as

Equation (12) gives (see Additional File 1, Appendix 2) the following maximum likelihood estimate for *OR*_{G}:

When *π*_{10} estimated by Eq. (9) is considered as a constant, the variance is given by:

In order to allow for the variance of the weights estimated by Eq. (9), we recommend the use of bootstrapping to estimate the 95% confidence interval using the 2.5 to 97.5 percentiles of *OR*_{G}.

From Eqs. (8), (9) and (13) we obtain:

Equation (15) shows that when the model involves only one binary exposure variable, the point estimate of *OR*_{G} is the same as *OR*_{MH}.

#### Case-crossover studies with a binary exposure and a binary time-varying confounder

The Mantel-Haenszel estimator is unbiased for binary exposures when there is pairwise exchangeability. When there is a binary time varying confounder (z), Maclure in his original proposal of the case-crossover design recommended further stratification when using the Mantel-Haenszel method [9]. In a simulation, we followed this recommendation and stratified each subject by z. As a result, control periods where z = 0 were excluded from subjects with z = 1 at the case period. Similarly, control periods where z = 1 were excluded from those with z = 0 at the case period. We have shown that excluding these control periods can lead to bias (see Tables 3 and 4 in Results section).

On the other hand, our method can be extended to studies with time-varying confounder to analyze data of all case and control periods and all periods can be included in the analysis. For example, when there is a binary exposure (x) and a binary time-varying confounder variable (z), *βx*_{ij} in the likelihood in Eq. (11) is specified as (*β*, *γ*)(*x*_{ij}, *z*_{ij})^{T}, which is equal to 0 when (*x*_{ij}, *z*_{ij}) =(0,0), *β* when (*x*_{ij}, *z*_{ij}) =(1,0), *γ* when (*x*_{ij}, *z*_{ij}) =(0,1), and *β* + *γ* when (*x*_{ij}, *z*_{ij}) = (1,1) where exp(*γ*) is an estimate of the odds ratio of z. Similarly to the finding that *OR*_{SCL} in Eq. (2) is unbiased when within-subject exposure dependency does not exist (Appendix 1, Additional File 1), we may estimate an unbiased *OR*_{G} in the model involving the exposure and time-varying confounder by calculating the weight from the exposure variable x (Eq. 10), if within-subject dependency does not exist for z during exposed periods (where the probability that the confounder is positive is *f*_{1}) and during unexposed periods (*f*_{0}) and the confounder is adjusted for as in the standard conditional logistic regression. Similarly as for a case-crossover study with a binary exposure, we recommend bootstrapped 2.5 to 97.5 percentiles of *OR*_{G} to estimating the 95% CI to allow for the variance of the weights (see Appendix 3, Additional File 1 for the detail; as to the relevant SAS codes, see 4-1 h and 4-2f in Appendix 4, Additional File 1).

### Simulation studies

We simulated data relevant to drug therapy with no time trends and with autocorrelated exposure patterns (within-subject dependency) (Appendix 4, Additional File 1). The simulated data is created based on the following observations (i) drug treatment sometimes has a cyclic pattern which often produces within-subject exposure dependency, (ii) some outcomes (e.g., acute adverse events) tend to occur soon after the treatment is initiated but they may also occur later during the drug therapy, and (iii) some patients stop treatment for various reasons while some patients start treatment. When the rate of stopping treatment is the same as that of starting treatment, the stationarity of the exposure may be maintained in the population as follows. We simulated scenarios with and without time varying confounding.

Assume that drug treatment involves a cyclic pattern where one cycle consists of 7 days and a patient has a drug on days 1 and 4 but no drug on days 2, 3, 5, 6, and 7. Figure 1 depicts three cycles of drug treatment with 8 subgroups consisting of 1 case period and 21 control periods, where 1 period is 1 day. Subgroup A represents stoppers who stop treatment at the case period while Subgroup H represents new users who start treatment at the case period. Subgroups B to G represent patients being treated with a different timing relative to the start of the treatment cycle. In Fig. 1, the proportion of those exposed to a drug in the population is always ¼, indicating stationarity (no time trends).

Figure 2a shows 140 cases who had the outcome at the case period when the size of each of Subgroups A to H, *N* = 10,000, the event rate in an unexposed period (*r*_{0}) is 0.001 per period and the rate ratio is 4. We assume that the expected number of cases is determined by exposure at the case period only, or *Nr*_{0} when unexposed and *N RR r*_{0} when exposed at the case period.

Figure 2b shows 184 cases who had the outcome at the case period when N = 10,000, the event rate in the unexposed period without time-varying confounding is 0.001 per period and the rate ratio is 4 and 2 for the exposure and time-varying confounder, respectively. The status of the time-varying confounder in the unexposed or exposed period is related to the exposure status of that period only, and *f*_{0} = 0.2 and *f*_{1} = 0.4 where *f*_{0} and *f*_{1} are the expected values of the proportion of exposed periods and unexposed periods in the population, respectively, when the confounder is positive (See Appendix 3, Additional File 1). We assume that the number of cases is as expected, or *Nr*_{0}(1 − *f*_{0}), *N RR*_{z} *r*_{0}*f*_{0}, *N RR r*_{0}(1 − *f*_{1}), and *N RR RR*_{z} *r*_{0}*f*_{1}, when the combination of the exposure and time-varying confounder variables (x, z) at the case period is (0, 0), (0, 1), (1, 0), and (1, 1), respectively. The status of the time-varying confounder in the control periods was randomly generated and is therefore independent of the exposure or time-varying confounder at different periods.

To determine the effect of the length of each time period on the estimated odds ratio, we also analyzed the data by dividing 22 days in Fig. 1 into 11, 7, 5, 3, and 2 periods (M = 10, 6, 4, 2, and 1) where 1 period included 2, 3, 4, 7 and 11 days, respectively. The status of the exposure and time varying confounder was defined by the last day of each period (Definition I in Table 1). We analyzed the data by standard conditional regression, the Vines and Farrington method, Mantel-Haenszel methods, and our modified Greenland’s method for the fixed study period of 22 days. With time-varying confounder (z), case and control periods of each subject were further stratified by z in the Mantel-Haenszel method. We used bootstrapping to estimate 2.5 to 97.5 percentiles for *OR*_{G}. Data simulation and analyses were performed using SAS 9.4 and SAS codes including those for bootstrap method are shown in Appendix 4 in Additional File 1.

### Case-crossover studies of real-world data

The method was also applied to data from Japanese and Australian databases. The Japanese study on the association between celecoxib and peripheral edema from a previous study [22] was approved by the ethics committee of Tokyo University of Science (approval number: 18023) where obtaining the informed consent from study subjects was waived for the current study. The Japanese data came from 25 corporate-type health insurance plans extracted from Cross-Fact database [23]. Claims data covering 60 months between May 2013 to April 2018 included 1,163,968 males (age (SD) = 42.5 (13.2) years old) and 1,349,901 females (42.2 (13.1) years old) who were 20 years old or older (but younger than 75 years old). As detailed in Additional File 1 (Appendix 5), we examined 99,821 new users of celecoxib, who used celecoxib after at least 180 days of non-use. The occurrence of peripheral edema was defined by new use of furosemide after at least 180 days of non-use and the index date was the day when the outcome occurred. Daily exposure during an 84-day study period was determined using a 7-day grace period. We selected 311 cases who had both exposed and unexposed days during the study period. The 84-day study period was divided into (M + 1) periods where M = 1, 2, 5, 11, 27 and 83, resulting in periods of 42, 28, 14, 7, 3 and 1 days, respectively. Four different definitions were used to determine exposure in the case and control periods as shown in Table 1. Cases who started celecoxib on the index date were excluded from the analysis since furosemide could have been prescribed for prevention rather than treatment of edema.

The data was analyzed by standard conditional logistic regression (Eq. (2)) and the Mantel-Haenszel method with their 95% CIs, and the weighting method for a binary exposure (Eq. (11)) with 2.5 to 97.5 percentiles estimated by the bootstrap method. We also extended the study period to 168 and 336 days. All the analyses were performed using SAS 9.4. Data and SAS codes to analyze the data are available upon request.

The data for the Australian study on the association between hip fracture and psychoactive medicines [24, 25] were obtained from the Australian Department of Veterans’ Affairs administrative claims database. The study was approved by Department of Defense and Veterans’ Affairs Human Research Ethics (E016–007) and University of South Australia Human Research Ethics (P203/04) where obtaining the informed consent from study subjects was waived for the current study. The cases were 8828 patients aged over 65 years who were hospitalized for hip fracture between 2009 and 2012. The index date for each case was the date of hospitalization. We have re-analyzed the data from the previous case-control study [25] as a case-crossover study with SSRIs as the exposure using the same methods as the Japanese study.

Both of the studies in Japan and Australia were carried out in accordance with the Declaration of Helsinki, and all methods were carried out in accordance with relevant guidelines and regulations in Japan and Australia.

## Results

### Simulation studies: comparison of methods with or without time-varing confounding

Table 2 and Fig. 3a show *OR*_{SCL}, *OR*_{VF}, and *OR*_{MH} with their 95% CIs, and *OR*_{G} with its 2.5 to 97.5 percentiles estimated for the scenario in Fig. 2a for M control periods (1 period = 1 day). The difference between *OR*_{SCL} and true RR (4.0) was more than 10% of the true value when M = 6 or > 7 and increased when M increased. When control periods were extended to 10 and 20 cycles (69 and 139 control periods), the estimate (95% CI) of *OR*_{SCL} was 7.92 (5.22–12.03) and 8.71 (5.59–13.57), respectively (not shown in Table 2). On the other hand, odds ratios from the remaining 3 methods (*OR*_{VF}, *OR*_{MH}, and *OR*_{G}) were unbiased irrespective of the value of M. The estimate of *OR*_{VF} cannot be estimated when M = 7, 10, 14, 17 and 21 because the positivity condition was not satisfied for any data. For example, when M = 7, ∑*P*(*X*_{i0} = 0, *N*_{exposed} = 1) = 0 because *X*_{i0} = 1 in Subgroup H which is only one subgroup where *N*_{exposed} = 1 and similarly for *N*_{exposed} = 2 or 3, *X*_{i0} was the same for all subgroups.

Table 3 and Appendix Fig. 3b show *OR*_{SCL}, *OR*_{VF}, and *OR*_{MH} with their 95% CIs, and *OR*_{G} with its 2.5 to 97.5 percentiles estimated for the time varying confounding scenario in Fig. 2b for M control periods (1 period = 1 day). The difference between *OR*_{SCL} for exp(*β*) and the true RR (4.0) was more than 10% of the true value when M > 5 and increased when M increased. The *OR*_{SCL} estimate for exp(*β*) was larger than the corresponding value in Table 2. When control periods were extended to 10 and 20 cycles, the point estimate of *OR*_{SCL} for exp(*β*) was 9.18 (6.23–13.52) and 10.84 (7.09–16.56), respectively. On the other hand, *OR*_{VF} and *OR*_{G} for both exp(*β*) and exp(*γ*) were in general unbiased, particularly when M > 7 where the estimates of *OR*_{VF} and *OR*_{G} for exp(*β*) were within 3% of the true value. The estimate of *OR*_{MH} for exp(*β*) varied between 3.82 (M = 2) and 4.65 (M = 13). Though not shown in Table 3 and Fig. 3bb, *OR*_{MH} for exp(*β*) estimated by ignoring the time-varying confounding was stable but overestimated as 4.67 (17% above the true value). When M = 1, *OR*_{VF} was 4.27 for exp(*β*) (about 7% overestimated) and 1.86 for exp(*γ*) (7% underestimated). Similarly, when M = 1, *OR*_{G} was 4.26 for exp(*β*) (7% overestimated) and 1.68 for exp(*γ*) (16% underestimated). When N was increased to 1,000,000, exp(*β*) and *e*xp(*γ*) were 3.96 and 2.09 for *OR*_{VF} and 3.97 and 2.04 for *OR*_{G} when M = 1 (not shown in Table 3).

Table 4 and Fig. 4a and b show *OR*_{SCL}, *OR*_{VF}, and *OR*_{MH} with their 95% CIs, and *OR*_{G} with its 2.5 to 97.5 percentiles for the scenarios in Fig. 2a and b, where the length of the study period was fixed as 22 days but the length of each time period varied between 1 and 11 days. For the scenario in Fig. 2a with one binary exposure variable only, *OR*_{SCL } for exp(*β*) varied between 4.00 and 6.61 when the length of the time period varied, but *OR*_{VF}, *OR*_{MH}, and *OR*_{G} were stable and unbiased (see Table 4 and Fig. 4a). For the scenario in Fig. 2b with one binary exposure variable and one time varying confounder randomly generated in the control periods, the *OR*_{SCL} estimate for exp(*β*) varied between 4.34 (1 period = 11 days) and 6.79 (1 period = 7 days) when M and the length of the time period varied, while *OR*_{VF} and *OR*_{G} for exp(*β*) were stable and close to the true value, though the odds ratio was a little overestimated as 4.34 when the time period was 11 days (see Table 4 and Fig. 4b). On the other hand, *OR*_{MH} for exp(*β*) varied 3.54 (1 period = 4 days) and 5.56 (1 period = 11 days). Though not shown in Table 4 and Fig. 4b, *OR*_{MH} for exp(*β*) estimated by ignoring the time-varying confounding was stable but overestimated as 4.67. When M = 1 (1 period = 11 days), *OR*_{VF} was 4.31 (8% overestimated) for exp(*β*) and 1.94 (3% underestimated) for exp(*γ*). Similarly, when M = 1, *OR*_{G} was 4.34 (9% overestimated) for exp(*β*) and 1.35 (32% underestimated) for exp(*γ*). When N was increased to 1,000,000, exp(*β*) and *e*xp(*γ*) were 3.97 and 1.95 for both *OR*_{VF} and *OR*_{G} when M = 1 (not shown in Table 4).

### Analyses of case-crossover studies of real-world data

Tables 5 and 6 show the estimates using Exposure Definition I in Table 1 in the Japanese study. In general, *OR*_{SCL} was different from *OR*_{G} and *OR*_{MH} except when M = 1, and *OR*_{SCL} estimated from Eq. (2) increased when study period increased: *OR*_{SCL} was between 1.91 and 2.82 when study period = 84 days (Table 5), between 2.13 and 4.19 when study period = 168 days (Table 6), and between 3.73 and 7.16 when study period = 336 days (Table 6). The point estimate of *OR*_{G} from Eq. (4) was always the same as that of *OR*_{MH}, as expected.

In Appendix Tables 5a and 5b in Additional File 1, the results corresponding to Table 5 and Table 6 for Exposure Definitions II, III and IV (Table 1) are presented. Those results indicated, as in Tables 5 and 6, that *OR*_{SCL} varied when M varied as well as when the study period varied, while *OR*_{MH} and *OR*_{G} were relatively stable. Detailed description for Definition II, III and IV is given in Appendix 5 in Additional File 1.

In the Australian study, after excluding the concordant cases, there were 1316 discordant cases with daily exposed and unexposed periods to SSRIs in the 180 days before the index date. Exposure on the index date was excluded since hip fracture may have occurred the day before admission to hospital.

We used periods of 1, 5, 20, 30, 60 and 90 days with M = 179, 35, 8, 5, 2 and 1, respectively. Exposure within each period was defined using Definition I (Table 1). Estimates of the *OR*_{SCL} were biased upwards (Table 7). When M = 1 (exposure period = 90 days), estimates for *OR*_{SCL} was identical to *OR*_{MH} and *OR*_{G} as expected. As in the Japanese study, *OR*_{SCL} from Eq. (2) increased with M for M > 1.

## Discussion

### Methods to remove BIAS from within-subject exposure dependency with or without time-varying confonders

Using simulated data, we showed that *OR*_{SCL} can be biased when there is within-subject exposure dependency but no exposure time trend, except when only one control period is used. When only one control period is used (M = 1), pairwise exchangeability is equivalent to global exchangeability and bias due to within-subject exposure dependency in standard conditional logistic regression does not occur (although bias due to time trends may occur). In Tables 2 and 3, *OR*_{SCL} increased with the increase of study period. This observation is similar to that in the previous case-crossover studies, where the odds ratio increased when a longer study period was employed [27,28,29]. In Table 2, *OR*_{VF}, *OR*_{MH}, and *OR*_{G} were unbiased. Of those 3 unbiased estimates, *OR*_{VF} may be difficult to calculate when analyzing real-world data because it requires population data (or samples from the population) to estimate the exposure probabilities, unlike *OR*_{MH} and *OR*_{G}. In addition, probabilities for many exposure permutations must be reliably estimated which may be intractable. For example, in Fig. 1 with 21 control periods, only 8 subgroups A to H were assumed to exist, but in real-world data, many more exposure patterns would occur. It is possible that the positivity condition is not satisfied for certain permutations, and these do not contribute to the estimation of *OR*_{VF}. These limitations make *OR*_{VF} difficult to use in practical applications.

As Vines and Farrington showed, when no exposure time trend exists and there is one binary exposure variable which is pairwise exchangeable, *OR*_{MH} and *OR*_{G} are unbiased even when *OR*_{SCL} is biased. However, when the model involves a time-varying confounder in addition to the exposure variable, *OR*_{MH} may be biased when periods of each subject are further stratified by the time-varying confounder as in Tables 3 and 4. In contrast, the time-varying confounder can be handled by *OR*_{G}. Results shown in Tables 3 and 4 indicate that our proposed method is unbiased when both within-subject exposure dependency and time-varying confounding occur at the same time, provided that the time-varying confounder is independent between periods.

In Tables 3 and 4, when M = 1 (i.e., only 1 control period is used), *OR*_{G} and *OR*_{VF} for exp(*β*) were overestimated and those for exp(*γ*) were underestimated, but the estimates approached to the true values when N was increased, suggesting that the deviation from the true values observed when M = 1 was due to random error. Conversely, it is likely that employing a larger number of control periods can produce more precise point estimates of exp(*β*) and exp(*γ*).

### Real-world data analysis

In the Japanese and Australian studies, we found a discrepancy between *OR*_{SCL} and *OR*_{G} when more than one control period was used. In the Japanese study, *OR*_{SCL} was more than 2 times larger than *OR*_{G} when the study period increased. We believe that this discrepancy occurred mainly due to within-subject exposure dependency, which increases as the study period increases. The estimates of *OR*_{G} were the same as *OR*_{MH} as expected.

In the Australian study, the discrepancy between *OR*_{SCL} and *OR*_{G} was modest compared with the Japanese study. This was compatible with the finding by Vines and Farrington that the bias due to within-subject exposure dependency is minimal when exp(β) ≈ 1 [15].

### Limitations of the current study and future direction

In the current study, our focus was on bias from within-subject exposure dependency. However, biases can occur from other sources as well. First, we have not applied the weighting method when a time trend in the exposure exists [14]. Though it was shown in the methods section that if there is no time trend, probabilities that the individual is exposed (*π*_{1}) and unexposed (*π*_{0}) at the case period can be estimated without data from population or time-controls, to check whether assumptions of pairwise exchangeability (no exposure time trend) are satisfied, we need data of the population or time-controls. When the proportion of periods exposed in the population or time-controls is roughly constant over study periods, this will support pairwise exchangeability assumptions. When time trend exists, case-time-control design [14] may be used, but more study is needed when both exposure time trend and within-subject exposure dependency exist. Second, a ‘washout period’ has been used in some case-crossover studies [16, 17, 19] to allow for the uncertainty in the optimal length of one period and to reduce within subject exposure dependency. In the current study, when the length of the time period varied, *OR*_{G} (and *OR*_{MH}) was stable in both the simulation study and real-world data. Since exposure was defined by the last day of the period, any period with 2 or more days was equivalent to using a ‘washout period’ at the beginning of the period. However, much more analyses are needed to examine the need for a ‘washout period’ and to determine the optimal length of one period particularly when within-subject exposure dependency exists. Another bias may occur when the event rate is not constant. For example, the event rate may be particularly high soon after the exposure is started, compared to later after treatment was started. One solution for this problem is to divide the exposed periods into high-risk and low-risk periods and considering this as different levels of exposure. When within-subject exposure dependency exists, this will require weighting for at least 3 exposure levels and the weighting method for binary exposures described in this study should be expanded for multi-level exposures.

Limitations of our study include that we assumed a time-varying confounder with no within-subject dependency. If within-subject dependency exists for a time-varying confounder, the weighting method may need to allow for both the exposure and time-varying confounders. Furthermore, when unmeasured time-varying confounders exist, the results still can be biased even when using our weighting method.

Finally, as mentioned earlier, the variance of *OR*_{G} in Eq. (14) is estimated assuming the weight in Eq. (10) is a constant. To allow for uncertainty in estimating the weights in Eq. (10), bootstrapping was used to estimate to 95% CI from 2.5 to 97.5 percentiles of *OR*_{G}. However, an analytical method to estimate the variance of *OR*_{G} which accounts for the variance of the weights may be useful and be developed in a future study.

## Conclusion

Despite autocorrelated exposures being common in pharmacoepidemiology, bias due to within-subject exposure dependency in the case-crossover study has had little attention in the pharmacoepidemiology literature and standard conditional logistic regression has been widely used without assessing whether this bias exists. Although using only one control period can avoid bias due to within-subject exposure dependency, this will reduce the accuracy of the estimate due to random error, as seen in our simulation study (the odds ratio when M = 1 in Tables 3 and 4). To assess for the possibility of bias, we recommend comparing the Mantel-Haenszel odds ratio with the standard conditional logistic regression odds ratio before starting analysis of case-crossover data. If time-varying confounders exist in data set, they may be ignored when comparing two odds ratios to assess the possibility of bias due to within-subject exposure dependency. If a substantial discrepancy is found (e.g., more than a pre-specified threshold such as 5 to 10% of the Mantel-Haenszel odds ratio), standard conditional regression should not be used. Either the Mantel-Haenszel method or our weighting method should be used instead. The weighting method has less bias than the Mantel-Haenszel method when a time-varying confounder exists and in such a case we recommend using our proposed method to estimate *OR*_{G} in the analysis of case-crossover data because it removes bias from within-subject exposure dependency and can account for time-varying confounders.

Future research will extend our weighting method to allow for time trends in the exposure, ‘washout periods’, multi-exposure levels which are potentially important when the event rate changes during exposed periods, and analytical method to have the variance of *OR*_{G} by making allowance for the uncertainty in estimating the weight.

## Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

## References

Maclure M. The case-crossover design: a method for studying transient effects on the risk of acute events. Am J Epidemiol. 1991;133(2):144–53.

Consiglio GP, Burden AM, Maclure M, et al. Case-crossover study design in pharmacoepidemiology: systematic review and recommendations. Pharmacoepidemiol Drug Saf. 2013;22(11):1146–53.

Checkoway H, Pearce N, Kriebel D. Selecting appropriate study designs to address specific research questions in occupational epidemiology. Occup Environ Med. 2007;64(9):633–8.

Kim JH, Mooney SJ. The epidemiologic principles underlying traffic safety study designs. Int J Epidemiol. 2016;45(5):1668–75.

Carracedo-Martínez E, Taracido M, Tobias A, et al. Case-crossover analysis of air pollution health effects: a systematic review of methodology and application. Environ Health Perspect. 2010;118(8):1173–82.

Lumley T, Levy D. Bias in the case-crossover design: implications for studies of air pollution. Environmetrics. 2000;11(6):689–704.

Rothman KJ, Greenland S, Lash T. Case-control studies. In: Rothman KJ, Greenland S, Lash T, editors. Modern Epidemiology. 3rd ed. Philadelphia, PA: Lippincott Williams & Wilkins; 2008. p. 111–28.

Mittleman MA, Mostofsky E. Exchangeability in the case-crossover design. Int J Epidemiol. 2014;43(5):1645–55.

Maclure M, Fireman B, Nelson JC, et al. When should case-only designs be used for safety monitoring of medical products? Pharmacoepidemiol Drug Saf. 2012;21(Suppl 1):50–61.

Farrington CP, Whitaker HJ, Hocine MN. Case series analysis for censored, perturbed, or curtailed post-event exposures. Biostatistics. 2009;10(1):3–16.

Farrington CP, Anaya-Izquierdo K, Whitaker HJ, et al. Self-controlled case series analysis with event-dependent observation periods. J Am Stat Assoc. 2011;106(494):417–26.

Petersen I, Douglas I, Whitaker H. Self controlled case series methods: an alternative to standard epidemiological study designs. BMJ. 2016;354:i4515.

Hallas J, Pottegård A. Use of self-controlled designs in pharmacoepidemiology. J Intern Med. 2014;275(6):581–9.

Suissa S. The case-time-control design. Epidemiology. 1995;6(3):248–53.

Vines SK, Farrington CP. Within-subject exposure dependency in case-crossover studies. Stat Med. 2001 30;20(20):3039–49.

Jensen AK, Gerds TA, Weeke P, Torp-Pedersen C, Andersen PK. On the validity of the case-time-control design for autocorrelated exposure histories. Epidemiology. 2014;25(1):110–3.

Tubiana S, Blotière PO, Hoen B, Lesclous P, Millot S, Rudant J, et al. Dental procedures, antibiotic prophylaxis, and endocarditis among people with prosthetic heart valves: nationwide population based cohort and a case crossover study. BMJ. 2017;358:j3776.

Grimnes G, Isaksen T, Tichelaar YIGV, Brox J, Brækkan SK, Hansen JB. C-reactive protein and risk of venous thromboembolism: results from a population-based case-crossover study. Haematologica. 2018;103(7):1245–50.

Sutcliffe S, Jemielita T, Lai HH, Andriole GL, Bradley CS, Clemens JQ, et al. A case-crossover study of urological chronic pelvic pain syndrome flare triggers in the MAPP research network. J Urol. 2018;199(5):1245–51.

Ryu SY, Kim J, Kim DW, Chung EJ. Association between ranibizumab injections and risk of acute myocardial infarction in age-related macular degeneration: a case-crossover study. Korean J Ophthalmol. 2020;34(2):150–7.

Doubleday A, Schulte J, Sheppard L, Kadlec M, Dhammapala R, Fox J, et al. Mortality associated with wildfire smoke exposure in Washington state, 2006-2017: a case-crossover study. Environ Health. 2020;19(1):4.

Wahab IA, Pratt NL, Wiese MD, et al. The validity of sequence symmetry analysis (SSA) for adverse drug reaction signal detection. Pharmacoepidemiol Drug Saf. 2013;22(5):496–502.

Cross-Fact claims database. https://www.intage-realworld.co.jp/service/cross_fact/ (in Japanese). Accessed July 30, 2021.

Leach MJ, Pratt NL, Roughead EE. Psychoactive medicine use and the risk of hip fracture in older people: a case-crossover study. Pharmacoepidemiol Drug Saf. 2015;24(6):576–82.

Leach MJ, Pratt NL, Roughead EE. Risk of hip fracture in older people using selective serotonin reuptake inhibitors and other psychoactive medicines concurrently: a matched case-control study in Australia. Drugs Real World Outcomes. 2017;4(2):87–96.

Greenland S. A unified approach to the analysis of case-distribution (case-only) studies. Stat Med. 1999;18(1):1–15.

Hallas J, Pottegård A, Wang S, et al. Persistent user Bias in case-crossover studies in Pharmacoepidemiology. Am J Epidemiol. 2016 Nov 15;184(10):761–9.

Wang P, Schneeweiss S, Glynn RB, et al. Use of the case-crossover design to study prolonged drug exposures and insidious outcomes. Ann Epidemiol. 2004;14(4):296–303.

Orriols L, Wilchesky M, Lagarde E, et al. Prescription of antidepressants and the risk of road traffic crash in the elderly: a case-crossover study. Br J Clin Pharmacol. 2013;76(5):810–5.

## Acknowledgements

Not applicable.

## Funding

This work was supported by an internal funding and but not by any external funding source.

## Author information

### Authors and Affiliations

### Contributions

The study was conceived by KK and the method was developed mainly by KK and TLK with comments occasionally given from other authors. TS, TY and KK analyzed Japanese data and NP, ER and TLK analyzed Australian data. The manuscript was drafted by KK and TLK and reviewed by all authors critically. All authors approved the final version of the manuscript.

### Corresponding author

## Ethics declarations

### Ethics approval

The study using real-world data in Japan was approved by the ethics committee of Tokyo University of Science (reference no. 18023) where obtaining the informed consent from study subjects was waived for the current study. The study using real-world data in Australia was approved by Department of Defense and Veterans’ Affairs Human Research Ethics (E016–007) and University of South Australia Human Research Ethics (P203/04) where obtaining the informed consent from study subjects was waived for the current study.

### Competing interests

We declare that the authors have no competing interests as defined by BMC, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

## Additional information

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Supplementary Information

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

## About this article

### Cite this article

Kubota, K., Kelly, TL., Sato, T. *et al.* A novel weighting method to remove bias from within-subject exposure dependency in case-crossover studies.
*BMC Med Res Methodol* **21**, 214 (2021). https://doi.org/10.1186/s12874-021-01408-5

Received:

Accepted:

Published:

DOI: https://doi.org/10.1186/s12874-021-01408-5