Research article  Open  Open Peer Review  Published:
Estimating timevarying exposureoutcome associations using casecontrol data: logistic and casecohort analyses
BMC Medical Research Methodologyvolume 16, Article number: 2 (2016)
Abstract
Background
Traditional analyses of standard casecontrol studies using logistic regression do not allow estimation of timevarying associations between exposures and the outcome. We present two approaches which allow this. The motivation is a study of vaccine efficacy as a function of time since vaccination.
Methods
Our first approach is to estimate timevarying exposureoutcome associations by fitting a series of logistic regressions within successive time periods, reusing controls across periods. Our second approach treats the casecontrol sample as a casecohort study, with the controls forming the subcohort. In the casecohort analysis, controls contribute information at all times they are at risk. Extensions allow left truncation, frequency matching and, using the casecohort analysis, timevarying exposures. Simulations are used to investigate the methods.
Results
The simulation results show that both methods give correct estimates of timevarying effects of exposures using standard casecontrol data. Using the logistic approach there are efficiency gains by reusing controls over time and care should be taken over the definition of controls within time periods. However, using the casecohort analysis there is no ambiguity over the definition of controls.
The performance of the two analyses is very similar when controls are used most efficiently under the logistic approach.
Conclusions
Using our methods, casecontrol studies can be used to estimate timevarying exposureoutcome associations where they may not previously have been considered. The casecohort analysis has several advantages, including that it allows estimation of timevarying associations as a continuous function of time, while the logistic regression approach is restricted to assuming a step function form for the timevarying association.
Background
Casecontrol studies are widely used to study associations between exposures and disease (or other) outcomes, especially when the outcome is rare. For overviews see Breslow and Day (1980) [1], Breslow (1996) [2] and Keogh and Cox (2014) [3]. In a ‘standard’ casecontrol study cases are individuals who experienced the outcome of interest within a specified time period and controls are chosen to represent the noncases in the same population.
In this paper we describe methods for estimating timevarying associations between exposures and outcomes using standard casecontrol study data, focusing on unmatched and frequency matched studies. Conventional analyses of casecontrol data using logistic regression do not accommodate timevarying associations. We outline two approaches. One is to estimate associations (odds ratios (OR)) separately within a series of time periods using logistic regression. The second treats the casecontrol sample as a casecohort study, with the controls forming the ‘subcohort’. The casecohort design [4] is a method for selecting a casecontroltype sample from a prospective cohort, enabling estimation of hazard ratios (HR) without obtaining complete information for the full cohort. See OnlandMoret et al. (2007) [5] for an overview.
We describe a motivating study before outlining the two proposed approaches and presenting results from a simulation study.
The motivation for this work was a case–control study of the longterm efficacy of infantBCG (Bacillus CalmetteGuérin) vaccination against tuberculosis (TB), in particular of whether the vaccine efficacy becomes weaker over time since vaccination. Incident cases aged 0 to 19 at the first disease episode were identified retrospectively from those occurring over a 10year period and recruited to the study. Controls were selected at the same time at which cases were retrospectively identified and chosen to represent the underlying population by sampling households, and so as to obtain approximately equal numbers of cases and controls within a series of birth cohorts.
The vaccination policy in the underlying population recommends administration of BCG before age 1. Participants’ vaccination status was ascertained using a combination of vaccination records, reported history, and inspection for BCG vaccination scar. It was of interest to estimate vaccine efficacy within a series of time periods postvaccination, and to model the vaccine efficacy smoothly with time since vaccination.
Rodrigues and Smith (1999) [6] give an overview of the use of casecontrol studies to study vaccine efficacy.
Methods
We outline two approaches to estimating timevarying exposureoutcome associations using unmatched casecontrol data:

(i)
Performing separate logistic regressions within a series of time periods.

(ii)
Treating the study as a casecohort study and applying a casecohort analysis.
Both approaches assume that the cases are rare in the underlying population.
We consider a casecontrol sample containing n individuals. The main exposure is denoted x, whose association with the outcome may vary over time. A vector of covariates is denoted z, which are assumed to have nontimevarying associations with the outcome.
Logistic regression analysis
We focus on estimating the association between the exposure and the outcome within L consecutive nonoverlapping time periods, that is assuming a step function form for the timevarying association. A logistic model for the probability of being a case in time period l is
where D _{ l } denotes case (D _{ l } = 1) or control (D _{ l } = 0) status in time period l, δ _{ Xl } is the log OR for the exposure x in time period l, and δ _{ Zl } is a vector of log ORs for the covariates z in time period l. The probabilities Pr(D _{ l } = 1x, z) are conditional on the casecontrol sampling scheme and the intercepts δ _{0l } do not have a useful interpretation [7]. We now discuss the definition of a case and a control in time period l, before outlining the analysis based on model (1).
We define an ‘index time’ for each individual. For cases the index time is the time they became a case, on the relevant time scale, e.g. the age at disease diagnosis. For controls the index time is the time up to which it is known they have not had the event; in the motivating example this was the time of being interviewed for the study. For cases, D _{ l } = 1 if the index time was in time period l. The question arises as to how to define controls in period l. We propose that a control individual can serve as a control in any time period up to and including that in which their index time falls. Therefore controls can contribute to the analysis in more than one time period. For example, in the motivating example the time scale is age and we assume for now that vaccination occurs at birth. We may wish to estimate the vaccine efficacy in age groups (or equivalently years since vaccination periods) 0–4, 5–9, 10–14, 15–19. Individuals interviewed as controls up to and including age 4 can only appear as controls for cases occurring in the 0–4 age group, while an individual interviewed as a control at age 14, say, may serve as a control in three age groups: 0–4, 5–9, 10–14. Another possibility would be to use control individuals in only one time period. However, this would be inefficient in comparison with our proposed scheme for the reuse of controls across multiple time periods. In the simulation study we investigate alternative control definitions. These issues are connected to the work of Lubin and Gail (1984) [8] and Robins et al. (1986) [9], who discuss control selection in nested casecontrol studies. We do not allow cases occurring in a given time period to contribute to the analysis as a ‘control’ at any time prior to that at which they become a case.
We let x _{ i } and z _{ i } denote the exposure and covariates respectively for individual i (i = 1, …, n). The full likelihood under the analysis approach proposed above, in which controls are reused across multiple time periods, is
where D _{ li } takes value 1 for a cases occurring in time period l and 0 for individuals eligible to be used as a control in time period l according to our proposed criteria. I _{ li } is an indicator of whether individual i contributes to the analysis in time period l, therefore taking value 0 for control individuals with index time less than the lower limit of period l and 1 for a controls with index time greater than the lower limit of period l. For cases, I _{ li } is 1 if the case occurs in period l and 0 otherwise. In practical terms, for the analysis the data can be arranged so that each case has exactly one row of data and each control has one or more rows of data; one row for each time period up to and including that in which their index time falls. The analysis can be performed in standard software for logistic regression by using interactions between time period and the exposure and covariates, allowing a separate intercept for each time period.
It may be reasonable to assume that the associations between the covariates z and the outcome is the same across time periods (δ _{ Zl } = δ _{ Z }, for all l = 1, …, L), or that the intercept is the same over time (δ _{0l } = δ _{0}, for all l = 1, …, L). If common parameters are used across time periods then the use of some individuals as controls within multiple time periods induces dependence between contributions to the likelihood and robust variance estimates should be used.
In the analysis proposed above, we do not allow cases to serve as controls in time periods before which they became a case, as this would result in overrepresentation of future cases in the control set in a given time period. The controls in a given time period are in fact individuals who remained free of becoming a case up to their index time. Therefore there is technically an underrepresentation of future cases in the control group in each period. However, when cases are rare in the underlying population we expect this to result in negligible bias.
Casecohort analysis
The logistic analysis estimates the exposureoutcome association (an OR) separately within time periods, i.e. assuming a step function, but does not extend to allow estimation of a smooth association over time. The way in which controls are used across time periods is also not ideal in that events happen in continuous time, but controls must be assigned within discrete time periods. The logistic analysis could in theory be performed using a large number of short time periods, to build up a detailed picture of how the exposureoutcome association changes over time. However, in practice the number of time periods that can reasonably be used is restricted by sample size.
We instead consider a casecohort analysis and start by describing the standard setting in which a casecohort study arises as a substudy within a prospective cohort. To obtain a casecohort sample the first step is to obtain a random sample of individuals from an underlying cohort at the start of followup (or, often, retrospectively, but as though it has been done at the start of followup), referred to as the subcohort. The subcohort may contain some individuals who become cases during the course of followup. The casecohort sample is comprised of the subcohort plus all individuals in the rest of the cohort who become cases during the course of followup. In the analysis of a casecohort study each case is compared at its event time with the individuals in the subcohort who are still at risk at that time using a pseudopartial likelihood (Fig. 1) [7].
In a standard casecohort analysis, we assume the Cox proportional hazards model [10] for the hazard for the event of interest
where t denotes the event time, h _{0}(t) is the baseline hazard at time t, β is the log HR for the exposure x, and γ is a vector of log HRs for the covariates z. This can be extended to accommodate a timevarying association between x and the hazard, by replacing β in (3) by β(t), which models the log HR for the exposure x as a function of time. There are various possibilities for the choice of β(t). A simple approach is to assume a step function form so that that the HR is assumed constant within a series of time intervals: β(t) = v _{1} β _{1} + v _{2} β _{2} + ⋯ + v _{ L } β _{ L }, where v _{ l } is an indicator taking value 1 when t is in time period l and value 0 otherwise (l = 1, …, L). Alternatively we can model the exposureoutcome association smoothly as a function of time, for example using a linear model, β(t) = β _{0} + β _{1} t. Another possibility is to use a spline [11]. Quantin et al. (1999) [12] discuss methods for modelling timevarying associations in Cox regression.
We denote the ordered event times t _{1} < t _{2} < … t _{ N } and the case at time t _{ j } is denoted i _{ j }. The parameters of the extended Cox proportional hazards model including β(t) are estimated using the pseudopartial likelihood:
where R _{ j } denotes the set of individuals in the subcohort who were at risk at time t _{ j } (including the case itself at time t _{ j } if the case is in the subcohort), plus the case itself at t _{ j } (if the case is not in the subcohort)[6]. This differs from the partial likelihood analysis of a full cohort study [13] only by the definition of R _{ j }; in a full cohort study R _{ j } would be replaced by the full risk set at time t _{ j }. Tied survival times can be handled using Breslow’s method (1972) [14]. The expression in (4) is a pseudopartial likelihood due to the ‘shared’ control group and Sandwich estimators, or an appropriate equivalent, are required to obtain correct standard errors [4]. The casecohort analysis can be performed using standard software for Cox regression after making a small modification to the data (the entry time (start of followup) for cases not in the subcohort is set to be just an instant before they become a case, ensuring that these cases only appear in the denominator of the pseudopartial likelihood at the time at which they are the case) and using robust standard errors.
We propose that a standard casecontrol study may be viewed as a casecohort study under the assumption that the cases are rare in the underlying population, and assuming that the case event times are known. In a usual casecohort study the subcohort may contain some cases by chance. However, in our situation of a standard casecontrol study the controls are selected from those who did not become cases during the followup period. If the cases are rare in the population then the controls will be approximately representative of the population in which the cases arose. Therefore, the casecontrol study can be viewed as a casecohort sample with the control group forming the subcohort. The analysis is as outlined above, with controls considered to be ‘at risk’ up until their index time (the date of interview in our motivating example).
The casecohort approach makes full use of the data by allowing controls to contribute information to all sampled risk sets R _{ j } up to their index time. A particular advantage of this approach is that it allows modelling of timevarying exposureoutcome associations as a continuous function of time; that is, we are not restricted to estimating the association within time periods. However, estimating a separate HR within a series of time periods will often be a sensible analysis particularly as a starting point.
The logistic analysis described in the preceding section may be thought of as a discretetime survival analysis. As the time periods become small and only contain a small number of cases, the appropriate analysis would be a conditional logistic regression with cases and controls in each period forming a matched set. In this case our proposed logistic analysis reusing controls across multiple time periods becomes equivalent to the casecohort analysis.
Extensions
Frequency matching of controls
Frequency matching in a standard casecontrol study is analogous to stratified sampling of the subcohort in a casecohort study [15], in which the subcohort is formed of random samples from a series of strata s (s = 1, …, S) defined by the frequency matching criteria. In this situation the baseline hazard h _{0}(t) in (3) is replaced by stratum specific baseline hazards h _{0s }(t). The pseudopartial likelihood in (4) is modified by replacing R _{ j } by R _{ sj }, the set of individuals in the subcohort who are at risk at t _{ j } and in the same stratum as the case which occurred at time t _{ j }, plus the case itself at t _{ j } (if the case is not in the subcohort). Frequency matching of controls can be accommodated in the logistic analysis by replacing δ _{0l } by δ _{0ls } in (1).
Timevarying exposures
Many studies involve timevarying exposures. This occurs in our motivating study, in which the time scale is age and vaccination occurs at different ages, though typically before the first birthday. Focusing on a binary exposure, we let x(t) denote the exposure at time t, on the relevant time scale. We now separate the time scale for occurrence of the event (t) and the time since exposure (u). The casecohort analysis accommodates both timevarying exposureoutcome associations and timevarying exposures. Under this extension the pseudopartial likelihood is
where u _{ kj } denotes the time since exposure for individual k at event time t _{ j }, and β(u) models the log HR as a function of timesinceexposure. To perform analysis the data would be arranged with multiple rows per control individual to accommodate both changing exposure over time and different times since exposure; cases would still have only one row of data.
The logistic regression approach does not extend easily to accommodate a timevarying exposure with a timevarying association with the outcome. A nontimevarying association between a timevarying exposure and an outcome can be estimated using logistic regression by fitting separate models within a series of time periods, using current values of the timevarying exposure in each period, and pooling the estimates across periods. This gives similar results to Cox regression using timevarying exposures [16]. This approach could be extended to the setting of a timevarying exposure with a timevarying exposureoutcome association, by fitting logistic regressions within subintervals of each time period of interest for the timevarying association and obtaining a pooled estimate across subintervals within each time period, using current values of the timevarying exposure in each regression. However, this is cumbersome and requires sufficient numbers of cases and controls within subintervals. Therefore we consider the logistic regression approach to be impractical for timevarying exposures with a timevarying association with the outcome.
Left truncation
In the motivating study the cases were cases of TB occurring between 2003 and 2012, resulting in left truncation prior to 2003. Left truncation is accommodated in the casecohort analysis by having control individuals enter the risk set starting only at the time from which they would have been eligible to become a case. Left truncation can be accommodated in the logistic approach by extending the definition of a control within a given time period. We propose that an individual can appear as a control in any time period in which they are observed for any length of time, including when they do not enter the risk set until partway through the time period due to left truncation.
Results
We use a simulation study based on the motivating example to investigate the performance of our proposed methods.
Simulating the data
We first generated full cohort data within which cases occur in time, and then obtained a frequency matched casecontrol sample within that.
Full cohort data were generated for n _{ b } individuals in five birth cohorts (b = 1, 2, 3, 4, 5) covering the period 1984–2012. Dates of birth were generated uniformly within each birth cohort. The sizes of the birth cohorts (n _{1}, …, n _{5}) were chosen to give particular numbers of cases in different age groups (mimicking the numbers expected in the motivating example), resulting in approximately 582 cases in each full cohort.
The exposure (vaccination status) was generated randomly from a binomial distribution within each birth cohort, using the following exposure percentages which mimic changes in vaccination uptake in the target population over time: birth cohorts 1 and 2: 60 %; birth cohort 3: 80 %; birth cohorts 4 and 5: 90 %.
We assumed the vaccine efficacy declined over time, with HR 0.25 in the time period up to 5 years after exposure, and a subsequent increase in the HR by 35 % every 5 years, giving the HRs across years since exposure periods (which here are the same as age groups): age 0–4: 0.25 , age 5–9: 0.34, age 10–14: 0.46, age 15–29: 0.62, age > =20: 0.83.
Event times were generated using a piecewise exponential model with event rates differing by 5year age group and using the above HRs. Estimates of agespecific TB rates for the target population were obtained from 5year average TB rates in England [17]. The number of cases per 100,000 across age groups were: age 0–4: 13, age 5–9: 14, age 10–14: 16, age 15–19: 36, age 20–24: 40, age 25–29: 45.
All individuals were followed up to the end of 2013. The index time for cases was their age at TB diagnosis and that for noncases was their age at the end of 2013. Left truncation was introduced so that events were only observed from 2003. Cases were all individuals having the event before the censoring time at the end of 2013 and aged 19 or under at the time of becoming a case. Individuals eligible as controls in the casecontrol sample were those who had not had TB by the end of 2013. Controls were sampled randomly within the 5 birth cohorts such that the number of controls in each birth cohort group was the same as the number of cases, as in frequency matching. The casecontrol study comprises all cases plus the sampled controls.
We generated 1000 simulated casecontrol data sets.
Methods
In each simulated casecontrol data set we estimated the exposureoutcome association within 5year age groups 0–4, 5–9, 10–14, and 15–19, using the methods outlined below.

1.
Logistic regression analysis using controls across multiple time periods.
In the logistic analyses we allow a separate intercept parameter in each birth cohort. We consider four ways of defining a control in time period l, which has lower limit τ _{ lA } and upper limit τ _{ lB }, where T _{ E } denotes the entry time for a given control (i.e. start of followup, accounting for left truncation) and T _{ I } denotes the index time:
Control definition (i) T _{ E } < τ _{ lB }, T _{ I } ≥ τ _{ lA }. This is our proposed approach. A control individual can serve as a control in any time period in which their start of followup (entry time) is before the upper limit of the time period and in which their index time is after the lower limit of the time period.
Control definition (ii) T _{ E } < τ _{ lB }, T _{ I } > τ _{ lB }. A control individual can serve as a control in any time period in which the start of followup (entry time) is before the upper limit of the time period and in which their index time is after the upper limit of the time period.
Control definition (iii) T _{ E } ≤ τ _{ lA }, T _{ I } ≥ τ _{ lA }. A control individual can serve as a control in any time period in which their start of followup (entry time) is before the lower limit of the time period and in which their index time is after the lower limit of the time period.
Control definition (iv) T _{ E } ≤ τ _{ lA }, T _{ I } > τ _{ lB }. A control individual can serve as a control in any time period in which their start of followup (entry time) is before the lower limit of the time period and in which their index time is after the upper limit of the time period. This is the most stringent control definition.

2.
Logistic regression analysis using each control in only one time period.
We consider an analysis in which control individuals are only used in one time period. Controls were allocated to a time period from those in which they were eligible to be a control (according to definition (i)) so as to achieve as far as possible an equal number of controls in each time period.

3.
Casecohort analysis.
The casecohort analysis was applied allowing a separate baseline hazard within each birth cohort.
The analyses were applied in the 1000 simulated data sets. The casecohort analysis gives estimates of HRs, while the logistic regression analysis gives ORs. Given cases are rare in the population we expect HRs and ORs to be very similar. Results are shown in Table 1.
Simulation results
All analyses give estimates of the exposureoutcome association within time periods which are very close to the true HRs. The casecohort analysis gives the estimates closest to the true HRs. All methods also give correctly estimated standard errors (comparing the empirical standard deviation with the model standard error) and good coverage.
The casecohort approach and the logistic approach using our proposed control definition (i) gave similar precision (looking at the empirical standard deviations). The precision of the logistic regression estimates varied according the control definition and whether controls were reused across time periods. Our proposed logistic regression approach which reuses controls according to definition (i) was the most efficient. Using control definition (ii) results in around a 20 % loss in efficiency compared to definition (i). Control definition (iv) is the most stringent and gives the largest standard errors. The logistic regression approach not reusing controls across time periods also gives a substantial loss of efficiency relative to our proposed method.
Discussion
We have outlined two approaches for estimation of timevarying exposureoutcome associations using casecontrol data; a logistic regression approach and a casecohort analysis. Our simulations showed that both methods give correct estimates of the timevarying association. The methods can be used to estimate timevarying associations from casecontrol data in settings where this may not previously have been considered a viable study design, notably in studies of vaccine efficacy over time. The approaches outlined assume that cases are rare in the underlying population.
The casecohort approach has a number of advantages and this is our recommended method of analysis. A major drawback of the logistic regression approach is that it is restricted to assuming a step function form for the timevarying association, i.e. estimation of the association within a series of time periods, while the casecohort analysis accommodates a flexible model for the timevarying association.
We showed how controls can be reused across time periods in the logistic regression approach. However, a further drawback of the logistic approach is that there is ambiguity over what the definition of a control should be in a given time period. In the simulation study we considered four definitions for controls, which determine whether a control individual is eligible to contribute to the logistic regression analysis in a given time period. Our results showed that there are considerable gains in efficiency by reusing controls across time periods, and that our proposed control definition (i) is most efficient. By contrast, the casecohort analysis automatically makes efficient use of controls and there is no ambiguity over the definition of a control at any time point, as a control individual contributes to the sampled risk set at all event times at which they were at risk. We found similar results using the casecohort analysis and the logistic regression analysis which makes most efficient use of controls.
In summary, the casecohort approach has several advantages over the logistic regression approach. It allows a flexible model for the timevarying exposureoutcome association and, because it handles time continuously, involves no ambiguity over the definition of a control at a given time point. Additionally, the casecohort approach easily accommodates timevarying exposures, whereas it is impractical to do this using logistic regressions.
We have focused on unmatched and frequency matched studies. Individual matching of cases to controls is also common, including the use of matching on ‘time’ using ‘concurrent sampling’; for example matched controls are selected from those who have reached the same age as the case at his/her event time. When the matching is in continuous time, this is equivalent to a nested casecontrol study in which controls are sampled from the risk set for each case. In this situation, the modified partial likelihood analysis used for nested casecontrol data is identical to a conditional logistic regression analysis. Niccolai et al. (2007) [18] discussed the use of a nested casecontrol design to study vaccine efficacy over time, and Vasquez et al. (2004) [19] used a study of this type to investigate the efficacy over time of the varicella vaccine.
Use of timevarying exposures in casecontrol studies has been considered previously in work which is closely connected to ours. Suissa et al. (2010) [20] described a ‘multitime casecontrol design’ for estimating the associations between timevarying exposures and an outcome using an unmatched casecontrol study, motivated by transient exposures. They noted that controls could provide exposure information for multiple time periods and outlined simple approaches to estimation of ORs, though did not extend to regression modelling. The methods described in this paper are an extension of their methods to a more general setting. Leffondre et al. (2003) [21] considered use of timevarying exposures in matched casecontrol studies. They investigated analyses based on both logistic and Cox regression. Their ‘augmented Cox approach’ is similar to our casecohort approach, as is the approach which was taken by Freedman et al. (2009) [22] to study the association between timedependent information on smoking and risk of Warthin’s tumour using data from a matched casecontrol study. Leffondre et al. (2010) [23] extended to situations in which cases are not rare in the underlying population, by considering weighted Cox models using information on event occurrence in the underlying population. Our methods could be extended in a similar way and this is an area for future work.
Conclusions
By using the casecohort analysis outlined in this paper, casecontrol studies can be used to estimate timevarying associations in settings where they may not previously have been considered a viable study design. A logistic regression approach can also be used to estimate timevarying associations but is restricted to modelling the timevarying association using a step function and controls should be defined using our definition (i) to avoid loss of efficiency.
Abbreviations
 BCG:

Bacillus CalmetteGuérin
 HR:

Hazard ratio
 OR:

Odds ratio
 TB:

Tuberculosis
References
 1.
Breslow NE, Day NE. Statistical methods in cancer research. Volume 1: The analysis of casecontrol studies. Lyon: International Agency for Research on Cancer Scientific Publications No 32; 1980.
 2.
Breslow NE. Statistics in epidemiology: The casecontrol study. J Am Stat Assoc. 1996;91:14–28.
 3.
Keogh RH, Cox DR. Casecontrol studies. Cambridge: Cambridge University Press; 2014.
 4.
Prentice RL. A casecohort design for epidemiologic cohort studies and disease prevention trials. Biometrika. 1986;73:1–11.
 5.
OnlandMoret NC, van der A DL, van der Schouw YT, Buschers W, Elias SG, van Gils CH, et al. Analysis of casecohort data: a comparison of different methods. J Clin Epidemiol. 2007;60:350–5.
 6.
Rodrigues LC, Smith PG. Use of the casecontrol approach in vaccine evaluation: efficacy and adverse effects. Epidemiological Reviews. 1999;21:56–72.
 7.
Prentice RL, Pyke R. Logistic disease incidence models and casecontrol studies. Biometrika. 1979;66:403–11.
 8.
Lubin JH, Gail MH. Biased selection of controls for casecontrol analyses of cohort studies. Biometrics. 1984;40:63–75.
 9.
Robins LM, Gail MH, Lubin JH. More on "Biased selection of controls for casecontrol analyses of cohort studies". Biometrics. 1986;42:293–9.
 10.
Cox DR. Regression models and life tables. J Roy Stat Soc B. 1972;34:187–220.
 11.
Hess K. Assessing timebycovariate interactions in proportional hazards regression models using cubic spline functions. Stat Med. 1994;13:1045–62.
 12.
Quantin C, Abrahamowicz M, Moreau T, Bartlett G, MacKenzie T, Tazi M, et al. Variation over time of the effects of prognostic factors in a populationbased study of colon cancer: comparison of statistical methods. Am J Epidemiol. 1999;150:1188–200.
 13.
Cox DR. Partial likelihood. Biometrika. 1975;62:269–76.
 14.
Breslow NE. Contribution to the discussion of the paper by D.R. Cox. J Roy Stat Soc B. 1972;34:216–7.
 15.
Borgan Ø, Langholz B, Samuelsen SO, Goldstein L, Pogoda J. Exposurestratified casecohort designs. Lifetime Data Anal. 2000;6:39–58.
 16.
D'Agostino RB, Lee ML, Belanger AJ, Cupples LA, Anderson K, Kannel WB. Relation of pooled logistic regression to time dependent Cox regression analysis: the Framingham Heart Study. Stat Med. 1990;9:1501–15.
 17.
Health Protection Agency . Tuberculosis in the UK: Annual report on tuberculosis surveillance in the UK 2008. Health Protection Agency 2008; ISBN 9780901144966.
 18.
Niccolai LM, Ogden LG, Muehlenbein CE, Dziura JD, Vázquez M, Shapiro ED. Methodological Issues in Design and analysis of a matched casecontrol study of a vaccine’s effectiveness. J Clin Epidemiol. 2007;60:1127–31.
 19.
Vazquez M, LaRussa PS, Gershon AA, Niccolai LM, Muehlenbein CE, Steinberg SP, et al. Effectiveness over time of varicella vaccine. JAMA. 2004;291:851–5.
 20.
Suissa S, Dell'Aniello S, Martinez C. The multitime casecontrol design for timevarying exposures. Epidemiology. 2010;21:876–83.
 21.
Leffondre K, Abrahamowicz M, Siemiatycki J. Evaluation of Cox's model and logistic regression for matched casecontrol data with timedependent covariates: a simulation study. Stat Med. 2003;22:3781–94.
 22.
Freedman LS, Oberman B, Sadetzki S. Using timedependent covariate analysis to elucidate the relation of smoking history to Warthin’s tumor risk. Am J Epidemiol. 2009;170:1178–85.
 23.
Leffondre K, Wynant W, Cao Z, Abrahamowicz M, Heinze G, Siemiatycki J. A weighted Cox model for modelling timedependent exposures in the analysis of casecontrol studies. Stat Med. 2010;29:839–50.
Acknowledgements
The authors are grateful to Professor Ørnulf Borgan (University of Oslo) and Professor Sir David Cox (Nuffield College, Oxford) for their comments on this work. This work was conducted as part of the NIHR (HTA) funded project 08/17/01 “Observational study to estimate the changes in the efficacy of BCG with time since vaccination". PM and PND thank NIHR (HTA) for funding.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
RK developed the statistical methods, carried out the simulation study and drafted the manuscript. PND contributed to the development of the statistical methods. PND, PM and LR planned and carried out the study which motived these developments. All authors contributed to the refinement of the methods, the design of the simulation study and the writing of the manuscript.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Casecontrol study
 Casecohort study
 Cox proportional hazards model
 Logistic regression
 Timevarying association
 Vaccine efficacy