 Research article
 Open Access
 Published:
Matching methods to create paired survival data based on an exposure occurring over time: a simulation study with application to breast cancer
BMC Medical Research Methodology volume 14, Article number: 83 (2014)
Abstract
Background
Paired survival data are often used in clinical research to assess the prognostic effect of an exposure. Matching generates correlated censored data expecting that the paired subjects just differ from the exposure. Creating pairs when the exposure is an event occurring over time could be tricky. We applied a commonly used method, Method 1, which creates pairs a posteriori and propose an alternative method, Method 2, which creates pairs in “realtime”. We used two semiparametric models devoted to correlated censored data to estimate the average effect of the exposure $\overline{\mathit{\text{HR}}}\left(t\right)$: the Holt and Prentice (H P), and the Lee Wei and Amato (LWA) models. Contrary to the HP, the L W A allowed adjustment for the matching covariates (L W A _{ a }) and for an interaction (L W A _{ i }) between exposure and covariates (assimilated to prognostic profiles). The aim of our study was to compare the performances of each model according to the two matching methods.
Methods
Extensive simulations were conducted. We simulated cohort data sets on which we applied the two matching methods, the HP and the L W A. We used our conclusions to assess the prognostic effect of subsequent pregnancy after treatment for breast cancer in a female cohort treated and followed up in eight french hospitals.
Results
In terms of bias and RMSE, Method 2 performed better than Method 1 in designing the pairs, and L W A _{ a } was the best model for all the situations except when there was an interaction between exposure and covariates, for which L W A _{ i } was more appropriate. On our real data set, we found opposite effects of pregnancy according to the six prognostic profiles, but none were statistically significant. We probably lacked statistical power or reached the limits of our approach. The pairs’ censoring options chosen for combination Method 2  LWA had to be compared with others.
Conclusions
Correlated censored data designing by Method 2 seemed to be the most pertinent method to create pairs, when the criterion, which characterized the pair, was an exposure occurring over time. In such a setting, the LWA was the most appropriate model.
Background
Five percents of breast cancers occur among women before the age of 40 and in fewer than 2% of women under 35 [1]. Physicians are more and more faced with issues of posttreatment pregnancy, such as the optimal time before considering conception. In physiopathological terms, a pregnancy after breast cancer is not advisable, especially in patients with positive hormonal receptors [2, 3]. However, many observational studies report that pregnancy does not have any impact on the evolution of breast cancer [4][6], and could even have a longterm protective effect [2, 6][10]. This phenomenon can be due to the socalled “healthy mother bias” [7], i.e. only women who are in good health will undertake a pregnancy after treatment for breast cancer.
Pregnancy is therefore related to the prognostic condition of the patient, and its occurrence might change the prognostic effect of some factors, for instance, the hormonal receptors. The prognosis of positive hormonal receptors could be qualitatively (interaction) different before and after pregnancy (i.e. interaction between hormonal receptor and pregnancy).
In an attempt to control for this “healthy mother effect”, various statistical approaches have been used. Some authors used the standard Cox model [11] with pregnancy considered as a covariate whose value depends on time, and adjusting for the known prognostic factors at the diagnosis of cancer, reflecting the gravity of the disease [2, 6, 8, 9]. In spite of this adjustment for disease grade at diagnosis, pregnancy still remained a significant good prognostic factor, indicating a longterm protective effect of pregnancy, which is difficult to explain and to interpret. Others tackled the problem from the angle of an “illnessdeath” model [12][14] which makes it possible to describe the natural history of the disease, taking into account the prognostic profiles of the patients who had or did not have a pregnancy [15]. This model provided better understanding and interpretation of the effect of pregnancy by comparing transition probabilities to relapse between women who had or did not have a pregnancy. Moreover, taking into account a possible interaction between prognostic profiles and the pregnancy improved the estimation of the pregnancy prognostic effect. Overall prognosis was not adversely affected by subsequent pregnancy. However, to allow adjustment, such a complex model requires enough pregnancies and events of interest.
In both previous approaches, the adjustment relied only on known prognostic factors. However, measured prognostic factors might not be enough to characterize the prognostic status in such a disease. Matching subjects designs might be helpful to accomplish that.
Other researchers have conducted paired studies: pregnant and nonpregnant women were matched on the main known prognostic factors (hormonal receptor, proliferation level, nodal involvement, use of chemotherapy, year of diagnosis), and the nonpregnant had to be diseasefree for as long as the time from diagnosis to pregnancy of the pregnant women [3][5, 7, 10]. By this matching carried out on known and measured factors, one can suppose that the subjects of the same pair also share nonobservable, not observed or not measured factors, in addition to the factors of pairing. Thus, this design may improve the control of the “healthy mother effect” compared to the two approaches presented above.
However, to our knowledge, in such a case and contrary to the first methods cited previously, researchers [3][5, 7, 10] did not take into account the fact that pregnancy was an event occurring over time. They matched the pregnant woman to a nonpregnant one a posteriori, i.e. at the end of the follow up study, knowing which women were pregnant and which were not over the study period. They analyzed the data as if these pairs were a priori known and created at diagnosis, i.e. at time t=0. Moreover, they always used the stratified Holt and Prentice semiparametric model (HP) [16] to estimate the pregnancy prognostic effect, whereas other semiparametric models devoted to censored correlated data are available such as frailty models [17, 18] and marginal models [19][22]. Frailty models model the time distribution conditionally to a random effect (frailty covariate), specific to each pair, and which is not observed. The structure of correlation has to be defined. The latter leave the nature of dependence among paired failure times completely unspecified.
Nonparametric [23, 24] and parametric [25] approaches have been developed, but we focus on the semiparametric approach, more specifically on the commonly used marginal semiparametric model. This marginal approach was developed by Wei, Lin and Weissfeld [19] to analyze subjects with multiple events, and then Lee, Wei and Amato [20] adapted it to clustered subjects.
In this paper, we use the marginal paired proportional hazards model of Lee, Wei and Amato (LWA) [20] and the Holt and Prentice stratified model [16] (HP). The main difference between them lies in the ability of the LWA[20] model to adjust for matching covariates and for the possible interaction between the covariate and the exposure, contrary to the HP model. Mehrotra et al.[26] proposed an efficient alternative to the stratified Cox model analysis to estimate the exposure effect, which does not require the assumption of a common hazard ratio across strata. However, that model is not adapted to our particular context of a large number of strata, with very small sample size per stratum (in our work, a stratum is a pair), thus it will not be studied here.
With HP and LWA models, we considered two different methods to create our pairs: the a posteriori one commonly used and described previously, and a new one designing the pairs “in realtime” by taking into account the occurrence of the event over time, i.e. the pregnancy, which characterizes the subject’s group within the pair.
The goal was to determine the combination between matching methods (a posteriori and in “realtime”) and models (HP and LWA), which is the most efficient in terms of bias and Root Mean Square Error (RMSE) to estimate and test the pregnancy prognostic effect. In the following, pregnancy is referred to as the “exposure” occurring over time and the pairs are composed of an exposed and a nonexposed subject.
In the next section, the two matching methods as well as the models of analysis are presented. In Section ‘Simulations’, extensive simulations are used to analyze the performance of these Method  Model combinations in term of bias and RMSE of the estimated effect related to the exposure occurring over time. In Section ‘Real data application’, our findings are applied to real data in order to analyze the effect of a subsequent pregnancy on breast cancer evolution [9]. A brief discussion concludes the paper.
Methods
The subjects are matched on all the known prognostic factors represented by vector Z. In the following, index j corresponds to the rank of the exposed subject occurring at the jth position’s order among the n _{ e } exposed subjects at the end of the study (j=1,…,n _{ e }), and index i corresponds to the subject among the n subjects of the study (i=1,…,n). Let t _{ i } being defined as the followup time of each subject i, and t _{ E i } as the time exposure for a subject i; for a subject who is never exposed, we consider ${t}_{\mathit{\text{Ei}}}=+\infty $. ${R}_{m}\left(t\right)$ represents the subjects at risk of event and nonexposed at time t for the method m (m=1 or 2). Let the set of paired subjects being defined as P _{ j }={j,i(j)}, with i(j) the subject i of ${R}_{m}\left({t}_{{E}_{j}}\right)$ chosen for the jth exposed subject.
The two matching methods
The two matching methods applied in the context where exposure is an event occurring over time, are presented.
Method 1 matches the subjects as follows: all the subjects i exposed (${t}_{\mathit{\text{Ei}}}<+\infty $) are matched with a subject who was not exposed during the whole period of followup. This method is called an a posteriori method because the groups exposed and nonexposed are defined at the end of the study and are considered to be known and created at time t=0. According to Method 1, if a subject j undergoes an exposure at time t _{ E j }, then the eligible subject’s set ${R}_{1}\left({t}_{\mathit{\text{Ej}}}\right)$ of subjects i eligible to be matched to j could be written as follows: ${R}_{1}\left({t}_{\mathit{\text{Ej}}}\right)=\left\{i\ne j/{t}_{i}\ge {t}_{\mathit{\text{Ej}}}\text{AND}{t}_{\mathit{\text{Ei}}}=+\infty \right\}$. This approach has been commonly used in the literature [3][5, 7, 10] with exposure not being considered as timedependent.
As exposure is a timedependent event, a second approach is proposed which designs the pairs “in realtime”: according to Method 2, if a subject j undergoes an exposure at time t _{ E j }, then the eligible subject’s set ${R}_{2}\left({t}_{\mathit{\text{Ej}}}\right)$ of subjects i eligible to be matched to j at time t _{ E j }, could be written as follows: ${R}_{2}\left({t}_{\mathit{\text{Ej}}}\right)=\{i\ne j/{t}_{i}\ge {t}_{\mathit{\text{Ej}}}$ AND t _{ E i }>t _{ E j }}. ${R}_{2}\left({t}_{\mathit{\text{Ej}}}\right)$ includes subjects at risk that are not yet exposed at t _{ E j } and will be exposed after, and subjects who will never be exposed. A pair composed of an exposed subject and a nonexposed one who would never undergo the exposure is called a “perfect pair”, whereas it would be an “imperfect pair”. As a result, the nonexposed subject of an imperfect pair could be matched after exposure with another nonexposed subject.
Method 2 is similar to the one proposed by Lu et al. in casecohort studies [22] where the membership in the exposed and unexposed groups is the outcome to be explained, whereas in our work it is an explanatory variable.
For both methods, exposed and not exposed subjects are matched according to the covariate vector Z, and the not exposed subject has to be diseasefree for as long as the time from the starting point (disease diagnosis) to the exposure time.
For both methods, if several nonexposed subjects can be matched with an exposed one, the matched nonexposed subject is randomly chosen from the set of eligible subjects ${R}_{m}\left(t\right)$; if no nonexposed subjects are available, the exposed subject cannot be paired and is thus excluded from the analysis.
Note that even if a subject could belong to two different pairs with Method 2, these two pairs are independent while they are never at risk at the same time t.
The statistical models
In the following, λ _{ i }(t) is the instantaneous hazard function of outcome to be estimated for pair P _{ j }. It is noted ${\lambda}_{i}\left(t,{\mathbf{Z}}_{i}\right)$ to specify that the estimation is made on the pair P _{ j }, which is composed of the exposed subject j and the nonexposed subject i matching on Z _{ i }. This notation is the same for all the models studied, even those where the adjustment for Z _{ i } is not available.
For all the models presented below, ${E}_{i}\left(t\right)$ corresponds to the timedependent exposure status and is defined as follows: ${E}_{i}\left(t\right)=0$ if t<t _{ E i }, and E _{ i }(t)=1 if t≥t _{ E i }. The pair of subjects is also defined by a timedependent covariate: ${P}_{i}\left(t\right)=j$ if i∈P _{ j } and t∈[t _{ E j };t _{ i }], or ${P}_{i}\left(t\right)=0$ otherwise.
Holt and Prentice stratified Cox model
Holt and Prentice [16] adapted the standard Cox model [11] to analyze matched paired data.
The instantaneous hazard function is written for each subject i as
${\lambda}_{0i\left(j\right)}\left(t\right)$ is a pairspecific baseline hazard function that is assumed to be identical for both subjects of pair P _{ j }, considered here as strata; it is considered as a nuisance parameter not to be estimated. The exposure effect $exp\left(\gamma \left(t\right)\right)$ is then estimated, considering the betweenpair heterogeneity, by allowing the instantaneous baseline hazard to be different within each pair. It is assumed to be identical across strata (no interaction between the exposure and the pairs) and thus to be implicitly common for the whole exposed population: $exp\left(\gamma \left(t\right)\right)$ is defined as the populationweighted average of the stratumspecific hazard ratios. However, if this assumption is incorrect, i.e. in the presence of a true (and often undetected) interaction, using this model leads possibly to a biased and/or less powerful analysis [26].
Furthermore, with this model, estimation of the exposure effect cannot be adjusted for a possible interaction between the matching factors and the exposure. This stratified approach is sensitive to the unit number per strata and to the number of strata: the accuracy of the regression coefficients decreases for a small number of units per strata and/or many numbers of strata [27].
This model is implemented in R software [28] through the coxph function by including the term “strata ( P _{ i }(t))” with the other explanatory covariates.
Lee, Wei and Amato Cox model
The marginal LWA model [20] is an alternative to the standard Cox model [11] and is written as follows
if the exposure effect is not adjusted for the matching covariates vector Z;
if the exposure effect is adjusted for the matching covariates vector Z;
if the exposure effect is adjusted for the matching covariates vector Z, and for the interaction between Z and the exposure.
For each of these three LWA models, ${\lambda}_{0}\left(t\right)$ is an unspecified marginal baseline hazard function considered as common for all the pairs, so for the whole population. As above, it is considered as a nuisance parameter; $exp\left(\gamma \left(t\right)\right)$ is the average timevarying exposure effect as in the HP model, but adjusted (L W A _{ a }) or not (L W A _{ u }) for covariates Z and for the possible interaction between covariates and exposure (L W A _{ i }). Like the standard Cox model [11], the LWA assumes that all sample subjects are homogeneous (all subjects have the same ${\lambda}_{0}\left(t\right)$) in spite of the possible adjustment for covariates (unique difference between L W A _{ u } and HP).
This model is implemented in R software through the coxph function, by including the term “cluster ( P _{ i }(t))” with the other explanatory covariates.
For both models, the Proportional Hazard Assumption (PHA) was evaluated by Harrel’s test on scaled Schoënfeld residues. This test is implemented in R software through the cox.zph function. The possible timedependent effect of the exposure was taken into account by time intervals chosen a posteriori, and not by a timespecified function.
Note that the combination HP and Method 1, taking the exposure as a timedependent covariate or not, gave exactly the same estimation of $\overline{\mathit{\text{HR}}}\left(t\right)$, whereas LWA did not.
Table 1 presents all the models according to the adjustment or not for covariates.
In the following, different simulations and analyses were performed with R software version 2.13.0.
Results
Simulations
Objective
The main objective of the simulation study was to assess the ability of the H P and LWA models to estimate the true effect of exposure H R(t), defined by $exp\left(\gamma \left(t\right)\right)$, in a context of matched paired survival data, where the pairs were designed according to the two different methods described previously. The aim was to establish the most efficient Method  Model combination.
Dataset
Simulation of cohort data  Procedures and scenarios chosen
All the details of the cohort data simulation and the procedures and scenarios chosen are given in Appendix A.
We simulated the cohort data referring to an “illnessdeath” model with transition intensities ${\lambda}_{12}\left(t\right)$, ${\lambda}_{13}\left(t\right)$ and ${\lambda}_{23}\left(t\right)$ (Figure 1). The parameter of interest H R(t) corresponded to the ratio ${\lambda}_{23}\left(t\right)/{\lambda}_{13}\left(t\right)$. The average H R(t) is obtained from an exact formula involving the averages of ${\lambda}_{13}\left(t\right)$ and ${\lambda}_{23}\left(t\right)$ which are computed through a numerical approximation (transformation of the time from continuous to discrete values) (See the Appendix B: Calcul of the average exposure effect not adjusted for the covariates). The average H R(t) adjusted for the different covariates was estimated empirically: its estimation was obtained using large size samples to guarantee good precision. Moreover, note that the larger the ratio ${\lambda}_{12}\left(t\right)/{\lambda}_{13}\left(t\right)$, the larger the number of exposures in the simulated cohort.
Each subject was characterized by a prognostic profile through vector Z, which corresponded to three dummy covariates Z _{ k } (k=1,2 or 3). At time t=0, there were 2^{3}=8 possible profiles of Z factors.
The simulation model included (i) the choice of an instantaneous baseline risk function ${\lambda}_{\mathit{\text{uv}}}\left(t,\mathbf{Z}\right)$ for each of the three transitions $u\to v$ (Table 2), (ii) the choice of the Z effects, $exp\left({\beta}_{\mathit{\text{uvk}}}\right)$, for each transition, i.e. ${\lambda}_{\mathit{\text{uv}}}\left(t,\mathbf{Z}\right)={\lambda}_{0,\mathit{\text{uv}}}\left(t\right)exp\left({\beta}_{\mathit{\text{uv}}1}{Z}_{1}+{\beta}_{\mathit{\text{uv}}2}{Z}_{2}+{\beta}_{\mathit{\text{uv}}3}{Z}_{3}\right)$ and (iii) the choice for the censoring proportion.
For (i), an instantaneous average risk function ${\lambda}_{\mathit{\text{uv}}}\left(t,\mathbf{Z}=\overline{\mathbf{Z}}\right)$ for each of the three transitions was simulated. Table 2 displays the ${\lambda}_{\mathit{\text{uv}}}\left(t,\mathbf{Z}=\overline{\mathbf{Z}}\right)$ distributions of each transition used for each of the five different configurations of H R(t).
For (ii), ten different β _{ u v k } scenarios considered as plausible β _{ u v k } clinical values [9, 15], were performed. Given the five configurations chosen for H R(t) and these ten β _{ u v k } scenarios, 50 different situations were obtained.
Finally, for (iii), these previous 50 situations were first performed without censoring. Two levels of independent uniform censoring were implemented only to the following β _{ u v k } scenario: ${\phantom{\rule{1em}{0ex}}\mathit{\beta}}_{12}^{{}^{\prime}}=\left(0.2,0.4,0.8\right)$, β _{13}=−β _{12} and β _{23}=β _{12}; and they were applied to each of the five configurations of H R(t). This yielded to 10 more situations.
For each of the 60 situations, 1000 different data sets were generated with a sample size of 2000 subjects. At t=0, these 2000 subjects were allocated to eight Z profiles. At t>0, the 250 subjects of the 8 different profiles will be divided up in the three transitions and will change over time according to the five H R(t) configurations.
All theoretical values of H R(t) were calculated on the simulated cohort data. They were computed in the overall correlated censored data and inside each sample of the Z profile. The average H R(t) was calculated without and with adjustment for the matching covariates, i.e. $\overline{\mathit{\text{HR}}}\left(t\right)$ and ${\overline{\mathit{\text{HR}}}}_{a}\left(t\right)$ respectively. H R _{ i }(t) is the one estimated in each Z profile.
From the correlated censored data, HP and L W A _{ u } models are both assumed to give an average H R(t)i.e. $\overline{\mathit{\text{HR}}}\left(t\right)$ (considering different assumptions), so they are the only two models which could be compared. Fitting of model L W A _{ a } makes it possible to estimate an average H R(t)i.e. ${\overline{\mathit{\text{HR}}}}_{a}\left(t\right)$, while L W A _{ i } is assumed to give H R _{ i }(t) for each Z profile (Table 1).
As the exposure effect was considered to change over time for three of the five configurations, its estimation was assessed by time interval specified a posteriori.
Matching methods  Creation of censored correlated data from cohort data
For each data set, the two matching methods presented in Section ‘Methods’ were applied.
According to Methods 1 and 2, the two subjects in each pair were matched on the three covariates Z _{ k }, and the nonexposed subject had to be diseasefree for as long as the time from t=0 to exposure time of the exposed subject.
Then from the 2000 subjects simulated in cohort data sets and equally allocated to the 8 Z profiles, a number of pairs smaller or equal to the number of subjects in State 2 (i.e. pregnancy) were obtained. This latter depended on the situation simulated, resulting from the H R(t) configuration, the β _{ u v k } scenario and the censoring percent.
Statistical criteria used to compare the performances of the different estimators
To estimate a timedependent effect, the time interval $[\phantom{\rule{0.3em}{0ex}}0{t}_{max}]$ was divided into L time intervals I _{ l } defined a priori, according to the H R(t) configuration, and written as follows:
and
If there was no interaction and no timedependent effect, an estimation ${\widehat{\gamma}}_{s}$ was obtained that corresponded to the estimation of γ performed in the s ^{th} simulation set inside the same H R(t) configuration.
If there was an interaction and no timedependent effect, an estimation for each of the 8 Z prognostic profiles was obtained and expressed as ${\widehat{\gamma}}_{s}+{\widehat{\mathit{\alpha}}}_{s}^{{}^{\prime}}\mathit{Z}$.
If there was no interaction and a timedependent effect, an estimation for each of the I _{ l } time intervals was obtained and expressed as ${\widehat{\gamma}}_{\mathit{\text{sl}}}$.
If there was an interaction and a timedependent effect, 8×L estimations were obtained and expressed as ${\widehat{\gamma}}_{\mathit{\text{sl}}}+{\phantom{\rule{1em}{0ex}}\widehat{\mathit{\alpha}}}_{\mathit{\text{sl}}}^{{}^{\prime}}\mathit{Z}$.
To assess the combination Method  Model to estimate H R(t), the bias and the RMSE of the estimations presented above were calculated. The 8 Z profiles are indexed by w=1 to 8, and the profile w is noted Z ^{(w)}.
In the event of with an interaction and a timedependent effect, the bias was estimated for each l and each Z ^{(w)} over the 1000 simulations as follows:
and
where, in each I _{ l } interval, $\overline{{\gamma}_{l}+{\mathit{\alpha}}_{l}^{{}^{\prime}}\mathit{Z}}$ is the mean and $V\left({\widehat{\gamma}}_{l}+{\widehat{\mathit{\alpha}}}_{l}^{{}^{\prime}}\mathit{Z}\right)$ the empirical variance of the 1000 parameters estimated ${\widehat{\gamma}}_{\mathit{\text{sl}}}+{\widehat{\mathit{\alpha}}}_{\mathit{\text{sl}}}^{{}^{\prime}}\mathit{Z}$:
To compare the different combinations Method  Model, the bias was averaged over the profiles (b _{ l•}) or over the time intervals $\left({{b}_{\u2022\mathbf{Z}}}^{\left(w\right)}\right)$ or both (b _{••}).
The associated RMSEs are given by
Note that if the exposure effect is not timedependent, then L=1. If there is no interaction between the exposure and the prognostic profiles, then α=0.
Results
One particular situation among the 60 simulated is described. Figure 2 displays the increasing then decreasing H R(t) configuration, for each profile and on average, without censoring and with ${\mathit{\beta}}_{12}^{{}^{\prime}}=\left(0.2,0.4,0.8\right)$, β _{13}=−β _{12} and β _{23}=−β _{13}. In this particular situation which corresponds to a “healthy effect” because of the negative values of β _{12}, Figure 2 shows three different overall effects of the exposure: a pejorative one in the three better prognostic profiles (PP) (${\mathbf{Z}}^{{}^{\prime}}\in \left\{\left(0,0,0\right),\left(0,0,1\right),\left(0,1,0\right)\right\}$), no effect in the intermediate PP (${\mathbf{Z}}^{{}^{\prime}}\in \left\{\left(0,1,1\right)\right\}$) and a protective effect in the last four PP (${\mathbf{Z}}^{{}^{\prime}}\in \left\{\left(1,0,0\right),\left(1,0,1\right),\left(1,1,0\right),\left(1,1,1\right)\right\}$). With β _{23}=−β _{13}, we force an interaction between Z and the exposure. Note that in this particular configuration chosen, where β _{23}=−β _{13}, $\overline{\mathit{\text{HR}}}\left(t\right)\simeq {\overline{\mathit{\text{HR}}}}_{a}\left(t\right)$ and their values are so close that the difference between them is not visible in Figure 2.
Number of pairs
Inside each profile, the maximum number of pairs was determined by the number of exposed subjects. With Method 1, this number was also limited by the number of “perfect” nonexposed subjects, but not with Method 2 since the nonexposed subject set was composed of “perfect” and “imperfect” nonexposed subjects. The difference between the number of pairs from the two methods depended on the number of exposures: the larger the number, the larger the difference. We computed the relative difference (RD) in number of pairs between the two methods defined as
whose median was equal to +55% (range, +23% to +220%). Figure 3 represents the distribution of the number of pairs according to the profiles and to the matching methods: the median number with Method 2 was always larger than or equal to that with Method 1.
Figures 4 shows the number of subjects pertaining to the three possible subjects groups at each time t: the exposed subject (solid green line), the nonexposed subject who never will undergo the exposure (solid red line) and the nonexposed subject who will undergo the exposure (solid blue line). The dotted vertical green line represents the time of first exposure, i.e. the time of occurrence of the first pair; the dotted vertical blue line corresponds to the time of the last perfect pair’s creation and the dotted vertical red line corresponds to the time of the last imperfect pair’s creation. With Method 2, the larger the ratio ${\lambda}_{12}\left(t\right)/{\lambda}_{13}\left(t\right)$, the larger the number of imperfect pairs and thus the greater the probability for an exposed subject to belong to an imperfect pair. Table 3 provides the proportion of imperfect pairs among the whole pairs which was estimated over the 1000 simulated data sets of our particular situation. It was equal to 81% in the good profile ${\mathbf{Z}}^{{}^{\prime}}=(0,0,0)$, decreasing to 54%, 44% and 20% in the Z profiles (1,1,0), (0,0,1) and (1,1,1), respectively. Moreover, the larger the ratio ${\lambda}_{12}\left(t\right)/{\lambda}_{13}\left(t\right)$, the faster the pair’s creation stopped. After the last dotted line (blue or red, depending on the Z profile), the exposed subjects are no longer able to be matched with a nonexposed one, because they are no longer available subjects; the pair’s creation stopped at a time that gradually increased from ${\mathbf{Z}}^{{}^{\prime}}=(0,0,0)$ to ${\mathbf{Z}}^{{}^{\prime}}=(1,1,1)$. For instance, this is illustrated for profile ${\mathbf{Z}}^{{}^{\prime}}=(0,0,0)$ in Figure 4A: at the time of first exposure (dotted vertical green line), the exposed subject was more likely to be matched with an imperfect subject than to a perfect nonexposed one. This set of nonexposed subjects decreased over time because each of them was matched with an exposed subject until the dotted blue line, when no more nonexposed subjects were available, while a new exposed subject, belonging before to a pair as a nonexposed one, appeared. This set of exposed subjects increased and was not able to be matched because there were no longer any nonexposed subjects. For a same level of ${\lambda}_{12}\left(t\right)/{\lambda}_{13}\left(t\right)$ ratio, RD was larger depending on the censoring percent: the higher the censoring percent, the smaller the RD.
Bias and RMSE
Figures 5 displays the distribution of the bias and the RMSE of the exposure effect estimator, according to our four models.
With Method 1, the estimation of $\overline{\mathit{\text{HR}}}\left(t\right)$ was biased with HP and L W A _{ u }, but more with HP; the estimation of ${\overline{\mathit{\text{HR}}}}_{a}\left(t\right)$ was biased with L W A _{ a }. Whatever the model applied, $\overline{\mathit{\text{HR}}}\left(t\right)$ (HP, L W A _{ u }) and ${\overline{\mathit{\text{HR}}}}_{a}\left(t\right)\left({\mathit{\text{LWA}}}_{a}\right)$ were underestimated, i.e. a poor prognostic exposure effect tended to be ignored and a no exposure effect tended to become a protective one. In each Z profile, L W A _{ i } always largely underestimated H R _{ i }(t) (Figures 5). The bias b _{••} is equal to −0.77, −0.61, −0.58 and −0.52 under HP, L W A _{ u }, L W A _{ a } and L W A _{ i } models, respectively; and the R M S E _{••} is equal to 0.88, 0.65, 0.62 and 1.06 under HP, L W A _{ u }, L W A _{ a } and L W A _{ i } models, respectively (Figures 5). The bias and RMSE are given by PP in Table 3.
With Method 2 where $\overline{\mathit{\text{HR}}}\left(t\right)\simeq {\overline{\mathit{\text{HR}}}}_{a}\left(t\right)$, L W A _{ u } and L W A _{ a } gave very similar values, whereas HP gave a smaller estimation of $\overline{\mathit{\text{HR}}}\left(t\right)$ than L W A _{ u }. All these three models are very close to the true value of the effect of exposure, but none gave an unbiased estimation: this estimation was slightly over or underestimated according to the time, but much less than with Method 1. H R _{ i }(t) was overestimated in some Z profiles: H R _{ i }(t) was overestimated in the three PP, ${\mathbf{Z}}^{{}^{\prime}}\in \left\{\left(0,0,0\right),\left(1,0,0\right),\left(0,1,0\right)\right\}$, where the ratio λ _{12}(t)/λ _{13}(t) was large and then the proportion of imperfect pairs in the set of pairs to be analyzed, was also large (Figures 5). The mean of the timedependent bias b _{••} was equal to 0.10, 0.14, 0.15 and 0.04 under HP, L W A _{ u }, L W A _{ a } and L W A _{ i } models, respectively; and R M S E _{••} was equal to 0.40, 0.27, 0.27 and 1.05 under HP, L W A _{ u }, L W A _{ a } and L W A _{ i } models, respectively (Figures 5). The biases and RMSE are quite acceptable and much smaller than with Method 1. Table 3 displays the bias and RMSE more specifically, with the percent of imperfect pairs according to the PP: the larger the percentage of imperfect pairs, the larger the bias and RMSE.
All the conclusions displayed with Method 1 were the same for all five configurations and all the β _{ u v k } triplet values. The biases of $\overline{\mathit{\text{HR}}}\left(t\right)$, ${\overline{\mathit{\text{HR}}}}_{a}\left(t\right)$ and ${\mathit{\text{HR}}}_{i}\left(t\right)$ were always huge (data not shown) but more or less followed the configuration and the β _{ u v k } scenarios. All the conclusions with Method 2 were valid for all configurations, and for all β _{ u v k } triplet values. In the configurations without interaction, i.e. where β _{23}=β _{13}, HP and L W A _{ u } models were more appropriate than L W A _{ a } and L W A _{ i } to estimate $\overline{\mathit{\text{HR}}}\left(t\right)$ in terms of bias and RMSE, given that L W A _{ u } was much less biased than HP in most of the scenarios. In some configurations where the proportion of the profile with the smallest ${\overline{\mathit{\text{HR}}}}_{i}\left(t\right)$ was the most highly represented profile leading to a low $\overline{\mathit{\text{HR}}}\left(t\right)$, HP was better than L W A _{ u }. L W A _{ a } was the only model for estimating H R _{ a }(t) and led to very slightly biased estimations of ${\mathit{\text{HR}}}_{a}\left(t\right)$ (data not shown). In the configurations with interaction, i.e. where β _{23}≠β _{13}, the L W A _{ i } model was the only appropriate one. However, in some of these configurations, L W A _{ i } slightly biased the extreme profiles and not the intermediate ones (data not shown). Over the 10 situations with censoring, the censoring percent ranged from 9% to 48%. Censoring did not change any of the previous conclusions.
Overall, in terms of bias and RMSE, Method 2 performed better than Method 1 to design the pairs, and L W A _{ a } was the best model for all the situations except when there was an interaction between the covariates and the exposure (β _{23}≠β _{13}), for which L W A _{ i } was more appropriate, even if the estimations with H R _{ i }(t) were not uniformly unbiased.
Real data application
Data
Our data retrospectively included 870 women treated for breast cancer between January 1990 and December 1999, and diagnosed before 35 years of age. Information on patients’ status was collected at the end of the year 2004 and the median followup was 87 months (range, 7 to 166) at this date. Tumor, treatment and disease evolution analyses are available in the paper of Largillier et al.[9]. One of the goals of the data analysis was to compare diseasefree survival between pregnant and nonpregnant patients. The protocol was submitted to the appropriate French authorities supervising individual computerized data files (Commission Nationale Informatique et Liberté [CNIL]), and obtained ethical approval from the Institut Curie ethics committee (Comité des Etudes en Recherche Clinique [CERC]). The GETNA Working Group, which was responsible for the data conception, design and acquisition, allowed us access to these real data.
We created pairs of pregnant and nonpregnant women using the two matching methods. Both women were matched according to their cancer gravity level using: ScarffBloomRichardson grade (SBR, related to cell proliferation level), pathological node involvement and hormonal receptor status (RH). They were also matched on treatment: hormonotherapy prescribed or not to RH+ patients, and chemotherapy administered or not before and/or after surgery. Within each pair, the nonpregnant woman had to be diseasefree for as long as the time from diagnosis to pregnancy of the pregnant woman. We sought to estimate the effect of subsequent pregnancy occurring over time on breast cancer evolution.
According to the results obtained on the cohort data [15] and to the known breast cancer clinical prognostic factors in the literature, patients with low cell proliferation level (SBR I or II) and no node involvement were considered to have a good prognostic profile and likely to plan to be pregnant (“healthy mother effect”). According to biological assumptions and need for therapy (hormonal treatment if R H+), women with negative hormonal receptors were also more likely to be pregnant. Thus, even if these factors were not significant in the paper by Savignoni et al.[15], it seemed relevant to estimate the effect of pregnancy according to the RH status (and the treatment associated) and according to a clinical Prognostic Profile (cPP); a poor c P P was defined as an SBR grade III and/or pathological involved nodes, leading to six prognostic profiles: RH negative with a good cPP, RH negative with a poor cPP, RH positive not treated with a good cPP, RH positive not treated with a poor cPP, RH positive treated with a good c P P and RH positive treated with a poor cPP. Then the effect of the pregnancy was estimated in the whole population by adjusting or not for the matching factors and with regard to the six prognostic profiles by adjusting for an interaction between the pregnancy, and the RH status (associated with treatment) and the cPP respectively. The effect of pregnancy was not adjusted for chemotherapy treatment.
Results
In view of our simulations, Method 2 was the matching method to apply on the cohort data in order to create correlated censored data that would give a more accurate estimate of the exposure effect. To be able to compare our results with previous findings [3][5, 7, 10], we used the H P model on correlated censored data designed from Method 1.
First, we applied the Method 1—HP combination as proposed in the literature [3][5, 7, 10]. Secondly, we applied the HP and LWA with pregnancy and pair as timedependent covariates on the paired survival data created with Method 2. With LWA, $\overline{\mathit{\text{HR}}}\left(t\right)$ estimation was carried out (i) without adjusting for matching covariates (L W A _{ u } estimates $\overline{\mathit{\text{HR}}}\left(t\right)$), (ii) by adjusting for all the matching covariates but without any interaction (L W A _{ a } estimates ${\overline{\mathit{\text{HR}}}}_{a}\left(t\right)$) and (iii) by adjusting for all the matching covariates and with an interaction between pregnancy and matching covariates (L W A _{ i } estimates H R _{ i }(t)). The latter could be applied in the event of real or assumed biological and clinical interactions. We tested the PHA with Harrel’s test on the two correlated censored data sets and with each model. The PHA was verified and we did not add any timeeffect for pregnancy in the models. H R(t) was compared to 1 by the Wald’s test with the appropriate variance according to the models.
In the cohort data, only 668 patients presented no missing data for the covariates of interest. Among them, 68 experienced a subsequent pregnancy. We obtained the maximum number of pairs available with Methods 1 and 2, i.e. 68 pairs: all pregnant women were then matched. The 68 pairs were not exactly the same between Methods 1 and 2. In Method 2 five pairs were imperfect, which represented a low proportion (7.4%). Among the 68 pairs obtained with Method 1, 32 women experienced an event (progression or death): 16 in the group of patients who became pregnant after the breast cancer diagnosis and 16 in the group of patients who did not. Among the 68 pairs obtained with Method 2, 29 women experienced an event: the same 16 in the group of patients who became pregnant after the breast cancer diagnosis and 13 in the group of patients who did not. No events occurred in the imperfect pairs. Only sixteen pairs (23.5%) were common between the two matching methods. The number of pairs and the number of final events in the pregnancy and nonpregnancy groups are given Table 4, according to the six profiles and to the matching methods. The number of pairs was divided into imperfect and perfect pairs for Method 2.
The HP model applied with Method 1 yielded the following estimations: $exp\left(\gamma \right)=0.90$, C I _{95% }[ 0.36−2.21], not significant. Kranick et al.[5] and Veletgas et al.[4] concluded likewise as did Azim et al.[3], but the latter stratified their analysis on estrogen status. Other authors [7, 10] found a protective role of pregnancy. All these authors used the HP model with Method 1 adjusted for the other known prognostic factors. In view of our simulation results, we believe that this estimation was underestimated and that the models should be used on censored correlated data created with Method 2. Regarding the latter, the HP model yielded a larger value than with Method 1 but was still not significant ($exp\left(\gamma \right)=2.00$, C I _{95% }[ 0.75−5.33]). In view of our simulations, we conclude that $\overline{\mathit{\text{HR}}}\left(t\right)$ was widely underestimated with HP using Method 1 and only slightly underestimated with HP using Method 2. The increasing value of $\overline{\mathit{\text{HR}}}\left(t\right)$ estimated by HP between the two matching methods was not surprising. Method 2 and the LWA model, with or without adjusting for matching covariates (L W A _{ u } and L W A _{ a }), yielded similar and not statistically significant results: $exp\left(\gamma \right)=1.22$, C I _{95% }[ 0.61−2.42] and $exp\left(\gamma \right)=1.24$, C I _{95% }[ 0.62−2.45], respectively. The difference in $\overline{\mathit{\text{HR}}}\left(t\right)$ values estimated by the HP and L W A _{ u } models was not statistically significant. In such a context, we would like to estimate the proper effect of the subsequent pregnancy. It is more relevant to estimate ${\overline{\mathit{\text{HR}}}}_{a}\left(t\right)$ than $\overline{\mathit{\text{HR}}}\left(t\right)$. Even if L W A _{ i } showed no significant interaction between the matching covariates and the pregnancy, because of the biological and clinical assumptions, we estimated the effect of pregnancy according to the six PP defined above using the L W A _{ i } model. Table 5 presents $exp\left(\gamma \right)$ estimations according to the models and their 95% confidence intervals.
The apparent protective role of pregnancy for the theoretically less favorable prognostic profile “ R H+ (treated or not) and poor cPP” and its increasing role in risk for the theoretically best prognostic profile “ R H− and good cPP” were surprising. There was no statistical significance, but it could be because of a lack of power of the combination Method 2  L W A _{ i }.
Discussion
In our context of exposure occurring over time, we focused on two matching methods: Method 1, commonly used in the literature [3][5, 7, 10], and Method 2, our new approach. Method 1 composes the pairs in an a posteriori way where the pairs are in fact considered to be known at diagnostic time t=0. Method 2 creates pairs in “realtime”. Such matching designs create independence between pairs but dependence between the subjects of the same pair, and specific analytical methods exist for such a situation of correlated censored data. In our work we studied two semiparametric models allowing for the stratified Holt and Prentice model [16] (HP) and the Lee, Wei and Amato model [20] (LWA). To estimate the average exposure effect H R(t) and unlike the LWA, the HP did not make it possible to take into account either the matching covariates or the possible interaction between the matching covariates and the exposure.
The aim of this study was to analyze the HP and LWA models using the two matching methods in order to propose the most efficient Method  Model combination to estimate and test the prognostic exposure effect H R(t) estimated through the models by $exp\left(\gamma \left(t\right)\right)$.
In view of our simulations, the relative difference in the number of pairs between Method 1 and Method 2 depends on the ratio ${\lambda}_{12}\left(t\right)/{\lambda}_{13}\left(t\right)$ and on the censoring percent. Compared to Method 1, the number of pairs was equal or larger with Method 2. In terms of bias and RMSE, Method 2 is more relevant than Method 1 to design the pairs, and L W A _{ a } is the best model for all the situations except when there is an interaction between the covariates and the exposure (β _{23}≠β _{13}), for which L W A _{ i } is more appropriate even if the H R _{ i }(t) estimations are not uniformly unbiased.
In our sample data, we applied Method 1 and HP to compare our results with those in the literature [3][5, 7, 10]. According to the simulation results, we applied Method 2 with HP and LWA. With both matching methods, we obtained an equal number of pairs (the maximum available) but not the same ones. HP used with Method 1 gave the smallest estimation of $\overline{\mathit{\text{HR}}}\left(t\right)$. $\overline{\mathit{\text{HR}}}\left(t\right)$ estimations were not statistically different from 1 with HP, L W A _{ u } and L W A _{ a }. According to biological and clinical hypothesis, we assumed that the effect of exposure was different according to the six prognostic profiles so we used the L W A _{ i } model. However, we did not conclude that pregnancy had neither a protective effect nor an adverse effect on progression disease, even for women with positive hormonal receptors and poor clinical profile at diagnosis. Estimating $\overline{\mathit{\text{HR}}}\left(t\right)$ by prognostic profiles using the combination Method 2  L W A _{ i }, showed a different and opposite effect of exposure in the six health profiles, but none was statistically significant. The sample size of the profiles was small and a few events occurred inside each profile, probably leading to a lack of power of the combination Method 2  L W A _{ i }.
Adjusting for the matching covariates and for their possible interaction with exposure could be very interesting to interpret the effect of exposure H R(t). However, it could be more difficult with real data, especially in the event of multiple prognostic profiles. The sample had to be large enough to enable us to assess the performances of the models according to the two matching methods, especially L W A _{ a } and L W A _{ i }, which required a large number of pairs. The erroneous estimation with L W A _{ i } in some extreme profiles could partially be explained as follows: in the “imperfect” pairs, we artificially prevent the nonexposed subject from having a final event, because she will first become an exposed subject of another pair. Maybe she will undergo the final event but as an exposed subject and in another pair. Then, when the proportion of imperfect pairs is large, H R _{ i }(t) could be overestimated. Moreover the larger the ratio ${\lambda}_{12}\left(t\right){\lambda}_{13}\left(t\right)$, the larger the number of imperfect pairs, and the higher the overestimation of H R _{ i }(t) and the faster its appearance. Owing to the values of the β _{ u v k } triplets chosen in the particular configuration presented above, the subjects of the three profiles ${\mathbf{Z}}^{{}^{\prime}}\in \left\{\left(0,0,0\right),\left(1,0,0\right),(0,1,0)\right\}$ experienced more exposure before any events, and simultaneously fewer events (before or after exposure), than the other PP. In these three profiles, the number of events in the exposed and in the nonexposed individuals is quite low, suggesting a possible lack of accuracy in the estimation of $\overline{\mathit{\text{HR}}}\left(t\right)$. All the models needed enough pairs and enough final events to fit: the larger the number of imperfect pairs and the smaller the number of events, the worse the accuracy of the estimation of H R _{ i }(t). In a future study, we will explore more deeply this problem of bias related to the percent of imperfect pairs in the sample.
Technically, whatever the matching method used, HP censored the pair when one of its subjects was censored or experienced the final event, whereas the LWA did not. The latter leaves each subject to be followed up to her censoring or final event, whatever the outcome of the matched subject. Concerning the management of the imperfect pair obtained with Method 2, the nonexposed subject i of pair P _{ j } was censored when she became the exposed subject i of another pair P _{ i } at the time ${t}_{{E}_{i}}$; by construction, as said before, her exposure occurred before any other events. For LWA, an alternative would be to censor the pair P _{ j }. Another alternative, whatever the model, would be to propose another nonexposed (perfect or imperfect) subject to the exposed subject of pair P _{ j }, who was now single in her pair. This issue deserves more investigation.
Using time intervals to estimate the effect of exposure as it changes over time was maybe not the most pertinent method, because it required many parameters. Therefore, to improve fitting, we intend to apply splines to fit the effect of exposure as it changes over time. Indeed, the present study sought to compare results of two known models devoted to censored correlated data, and the wellknown frailty model was set aside because it requires the structure of correlation within the pairs to be specified. The next step would be to compare the HP, LWA and the frailty model using Method 2.
Conclusions
In conclusion, correlated censored data designing by Method 2 seems to be the more pertinent method to create pairs when the criterion which characterizes the pair is an exposure occurring over time. It would be interesting to estimate the proper effect of the subsequent pregnancy. It is then more pertinent to estimate ${\overline{\mathit{\text{HR}}}}_{a}\left(t\right)$ than $\overline{\mathit{\text{HR}}}\left(t\right)$. Thus, L W A _{ a } seems to be the best model for all the situations, except when there is an interaction between the covariates and the exposure, for which L W A _{ i } is more appropriate, even if the estimations of H R _{ i }(t) are not uniformly unbiased. L W A _{ a } and L W A _{ i } gave a more accurate and relevant estimation of the effect of exposure in particular context, where we can reasonably suppose that the latter depends on prognostic profiles.
Appendix A: Simulation of cohort data  Procedures and scenarios chosen
For each subject four independent times were generated: t _{12}, t _{23}, t _{13} and C (censoring time), according to the intensities displayed in Table 2. From these times, 2 times of interest for each subject were derived: a time to exposure t _{12}=t _{ E } and a time to final event t _{ i }. Two indicator variables were derived: E=1 if an exposure occurs, 0 otherwise and Δ=1 if a final event occurs, 0 otherwise. Four possible quadruplets (t _{ E };t _{ i };E;Δ) could be defined as follows:
Such a design refers to an “illnessdeath” model with transition intensities ${\lambda}_{12}\left(t\right)$, ${\lambda}_{13}\left(t\right)$ and ${\lambda}_{23}\left(t\right)$ (Figure 1). All subjects were assumed to be in the initial state (state 1 or cancer diagnosis in our context) at time t=0. They could move to the final state (state 3 or disease progression) with a transition intensity ${\lambda}_{13}\left(t\right)$. They might undergo the intermediate event or exposure (state 2 or pregnancy) with intensity ${\lambda}_{12}\left(t\right)$, before developing any progression with intensity ${\lambda}_{23}\left(t\right)$. Date of entry into state 1 was chosen as time of origin for all transitions. Thus the parameter of interest H R(t) corresponded to the ratio ${\lambda}_{23}\left(t\right)/{\lambda}_{13}\left(t\right)$. However, to compute ${\lambda}_{23}\left(t\right)$, we took into account the left truncation phenomenon: before being at risk of an event in the transition $2\to 3$, a subject has to wait until its exposure occurs. This delayed entry leads the set of subjects at risk in transition $2\to 3$ to increase when an exposure occurs and to decrease when an event occurs. Thus the average H R(t) is obtained from an exact formula involving the averages of ${\lambda}_{13}\left(t\right)$ and ${\lambda}_{23}\left(t\right)$ which are computed through a numerical approximation (transformation of the time from continuous to discrete values) (See the Appendix B: Calcul of the average exposure effect not adjusted for the covariates). The average H R(t) adjusted for the different covariates was estimated empirically by using large size samples to guarantee good precision. Moreover, note that the larger the ratio ${\lambda}_{12}\left(t\right)/{\lambda}_{13}\left(t\right)$, the larger the number of exposures in the simulated cohort.
The simulation model included (i) the choice of an instantaneous baseline risk function ${\lambda}_{\mathit{\text{uv}}}\left(t,\mathbf{Z}\right)$ for each of the three transitions $u\to v$, (ii) the choice of the Z effects, $exp\left({\beta}_{\mathit{\text{uvk}}}\right)$, for each transition and (iii) the choice for the censoring proportion.
For (i), an instantaneous average risk function ${\lambda}_{\mathit{\text{uv}}}\left(t,\mathbf{Z}=\overline{\mathbf{Z}}\right)$ for each of the three transitions was simulated: either a constant risk using an exponential density function ^{(1)}, a monotone risk using a Weibull density function ^{(2)} or an increasing then decreasing risk using a loglogistic density function ^{(3)}. Five ${\lambda}_{\mathit{\text{uv}}}\left(t,\mathbf{Z}=\overline{\mathbf{Z}}\right)$ triplets were simulated in order to construct five realistic configurations of H R(t): two constant, one increasing, one decreasing and one increasing then decreasing, where H R(t) range values were clinically pertinent (between 0.5 and 4 in the whole population). Table 2 displays the ${\lambda}_{\mathit{\text{uv}}}\left(t,\mathbf{Z}=\overline{\mathbf{Z}}\right)$ distributions of each transition used for each of the five different configurations of H R(t).
For (ii), different β _{ u v k } values for each of these five ${\lambda}_{\mathit{\text{uv}}}\left(t,\mathbf{Z}=\overline{\mathbf{Z}}\right)$ triplets were chosen. Negative β _{12} values were proposed and set at ${\beta}_{12}^{{}^{\prime}}=\left(0.2,0.4,0.8\right)$. Only β _{13} and β _{23} had other possible values which were the following: (−0.2,−0.4,−0.8), (+0.2,+0.4,+0.8), (−0.1,−0.2,−0.4) and (+0.1,+0.2,+0.4). Ten β _{ u v k } scenarios were performed. Given the five configurations chosen for H R(t) and the ten β _{ u v k } scenarios, 50 different situations were obtained.
Finally, for (iii), these previous 50 situations were first performed without censoring. To minimize simulations time, two levels of independent uniform censoring were implemented only with the following β _{ u v k } scenario: ${\mathit{\beta}}_{12}^{{}^{\prime}}=\left(0.2,0.4,0.8\right)$, β _{13}=−β _{12} and β _{23}=β _{12}; and they were applied to each of the five configurations of H R(t). This yielded to 10 more situations (five H R(t) configurations with2 levels of censoring) for that β _{ u v k } scenario. The maximal event time ${t}_{max}$ was set at 1000. The first uniform distribution for censoring time C was over the interval time $\left[0;{t}_{max}\right]$, and the second one over $\left[0;2{t}_{max}\right]$; then the maximal censoring time was ${C}_{max}=+\infty $, ${t}_{max}$ or $2{t}_{max}$. The overall censoring level was higher in the first censoring distribution but it also depended on the H R(t) configuration. In total we had50 situations without censoring and 10 with censoring (the same five configurations with the two levels of censoring).
For each of the 60 situations, 1000 different data sets were generated with a sample size of 2000 subjects. At t=0, these 2000 subjects were allocated to the eight Z profiles. At t>0, the 250 subjects of the 8 different profiles are divided up among the three transitions and change over time according to the five H R(t) configurations.
Appendix B: Calcul of the average exposure effect not adjusted for the covariates $\overline{\mathit{\text{HR}}}\left(t\right)$
Density function ${f}_{23}\left(t\right),$ Repartition function ${F}_{23}\left(t\right)$ and Survival function ${S}_{23}\left(t\right)$ of transition $2\to 3$ taking into account the left truncation
where u is the exposure occurrence time and t the final event time.
To obtain an useful formula we divide the time in K tiny intervals with ${t}_{0}<{t}_{1}<\cdots <{t}_{k}<\cdots <{t}_{K}={T}_{max}$
and
The average instantaneous hazard $\stackrel{\u0304}{\lambda}\left(t\right)$ is equal to
where $\lambda \left(t\leftZ\right.\right)={\lambda}_{z}\left(t\right)$ is the average instantaneous hazard in the profile z at time t, ${N}_{z}\left(t\right)$ is the number of subject at risk in the profile z at time t. Then we can write
The computation of ${N}_{z}\left(t\right)$ is the following.
In transition $1\to 3$, a subject is at risk if it does not have any event and if it is not censored
where

${S}_{1{2}_{z}}\left(t\right)$is the survival rate in the transition $1\to 2$ for the profile z at time t,

${S}_{1{3}_{z}}\left(t\right)$is the survival rate in the transition$1\to 3$ for the profil z at time t,

${N}_{1{3}_{z}}$is the number of subject at risk in the profile z at time t=0,

$\stackrel{\u0304}{G}\left(t\right)=1G\left(t\right)$with G(t) is the repartition function of the variable C characterizing the censoring time. It does not depend on the profile.
In transition $2\to 3$, taking into account the left truncation, a subject is at risk of event in this transition, if it entered in the transition and if it had neither an event nor being censored.
${N}_{2{3}_{z}}$ could be expressed as
where

${E}_{2{3}_{z}}\left(t\right):$the number of subjects which entered in the transition $2\to 3$ in the time interval [ 0;t],

${D}_{2{3}_{z}}\left(t\right):$the subjects which undergo the final event in the time interval [ 0;t],

${C}_{2{3}_{z}}\left(t\right):$the censored subject in the time interval [ 0;t],

${N}_{2{3}_{z}}$is the number of subject at risk in the profile z at time t=0.
Then
where

${\lambda}_{1{2}_{z}}\left(t\right)$is the instantaneous hasard function of transition $1\to 2$ for the profil z at time t,

$g\left(t\right)$is the density function of the variable C characterizing the censored time. It does not depend on the profile.
Appendix C: Empirical estimation of the average exposure effect adjusted for the covariates ${\overline{\mathit{\text{HR}}}}_{a}\left(t\right)$
Let’s consider L intervals I _{ l } (l=1 to L) inside the study interval [ 0;T _{ m a x }], where all the i subjects are censored at time T _{ m a x }: ${I}_{l}=\left]{a}_{l1};{a}_{l}\right]:l=1,\cdots \phantom{\rule{0.3em}{0ex}},L$ with
Inside each I _{ l }, the exposure variable E _{ l }(t) is as follows:
Then we define: $E\left(t\right)=\sum _{l=1}^{L}{E}_{l}\left(t\right).$
Our model of simulation obtained from an “illnessdeath” model is as follows:
where λ(t,Z,E) is the instantaneous risk function of state 3 occurrence according to the exposure E and the covariates Z _{ k } (k=1to 3).
From this model, we are able to calculate H R _{ i }(t), i.e. the H R(t) for each profile (Z _{1}, Z _{2}, Z _{3}):
We approximate H R _{ a }(t) by
where the γ _{ l } are obtained from
We approximate H R(t) by
where the γ _{ l } are obtained from
For ${\overline{\mathit{\text{HR}}}}_{a}\left(t\right)$, the estimation of γ _{ l } is adjusted for the β _{ k }, whereas for $\overline{\mathit{\text{HR}}}\left(t\right)$ it is not.
The approximation of H R _{ i }(t), ${\overline{\mathit{\text{HR}}}}_{a}\left(t\right)$ and $\overline{\mathit{\text{HR}}}\left(t\right)$ is especially relevant when L and the total number of patients are large, in order to have enough exposure and events inside the transitions $2\to 3$ and $1\to 3$.
The approximations of ${\mathit{\text{HR}}}_{i}\left(t\right)$ and $\overline{\mathit{\text{HR}}}\left(t\right)$ could be compared to the theoretical values (See Appendix B: Calcul of the average exposure effect not adjusted for the covariates), not of ${\overline{\mathit{\text{HR}}}}_{a}\left(t\right)$.
To be sure of the approximation of ${\overline{\mathit{\text{HR}}}}_{a}\left(t\right),$ we simulated data from a Cox proportional hazards model with a timedependent covariate representing exposure. As we obtained directly the effect of the exposure we were able to certify that our approximation’s method was correct.
References
 1.
Kelsey J, Berkowitz GS: Breast cancer epidemiology. Cancer Res. 1998, 48: 56155623.
 2.
Kroman N, Jensen M, Melbye M, Wohlfahrt J, Ejlertsen B: Pregnancy after treatment of breast cancer  a populationbased study on behalf of danish breast cancer cooperative group. Acta Oncologica. 2008, 47: 545549. 10.1080/02841860801935491.
 3.
Azim H, Kroman N, Paesmans M, Gelber S, Rotmensz N, Ameye L, De MattosArruda L, Pistilli B, Pinto A, Jensen M, Cordoba O, de Azambuja E, Goldhirsch A, Piccart M, Peccatori F: Prognostic impact of pregnancy after breast cancer according to estrogen receptor status: a multicenter retrospective study. J Clin Oncol. 2013, 31: 7379. 10.1200/JCO.2012.44.2285.
 4.
Velentgas P, Daling J, Malone K, Weiss N, Williams M, Self S, Mueller BA: Pregnancy after breast carcinoma: Outcomes and influence on mortality. Cancer. 1999, 85 (11): 24242432. 10.1002/(SICI)10970142(19990601)85:11<2424::AIDCNCR17>3.0.CO;24.
 5.
Kranick J, Schaefer C, Rowell S, Desai M, Petrek J, Hiatt R, Senie R: Is pregnancy after breast cancer safe?. Breast J. 2010, 16 (4): 404411.
 6.
Lawrenz B, Banys M, Henes M, Neunhoeffer E, Grischke E, Fehm T: Pregnancy after breast cancer: case report and review of literature. Arch Gynecol Obstet. 2011, 315: 851855.
 7.
Sankila R, Heinavaara S, Hakulinen T: Survival of breast cancer patients after subsequent term pregnancy: healthy mother effect. Am J Obstet Gynecol. 1994, 170: 818823. 10.1016/S00029378(94)70290X.
 8.
Ives A, Saunders C, Bulsara M, Semmens J: Pregnancy after breast cancer: population based study. BMJ. 2007, 334: 194198. 10.1136/bmj.39035.667176.55.
 9.
Largillier R, Savignoni A, Gligorov J, Chollet P, Guilhaume M, Spielmann M, Luporsi E, Asselain B, Coudert B, Namer M: Prognostic role of pregnancy occurring before or after treatment of early breast cancer patients aged <35 years: a get(n)a working group analysis. Cancer. 2009, 115 (22): 51555165. 10.1002/cncr.24608.
 10.
Mueller B, Simon M, Deapen D, Kamineni A, Malone K, Daling J: Childbearing and survival after breast carcinoma in young women. Cancer. 2003, 98 (6): 11311140. 10.1002/cncr.11634.
 11.
Cox D: Regression models and lifetables. J Royal Stat Soc (B). 1972, 34: 187220.
 12.
Andersen P, Keiding N: Multistate models for event history analysis. Stat Methods Med Res. 2002, 11: 91115. 10.1191/0962280202SM276ra.
 13.
de Bock G, Putter H, Bonnema J, van der Hage J, Bartelink H, van de Velde C: The impact of locoregional recurrences on metastatic progression in earlystage breast cancer: a multistate model. Breast Cancer Res Treat. 2009, 117: 401408. 10.1007/s1054900803002.
 14.
Putter H, Fiocco M, Geskus R: Tutorial in biostatistics: Competing risks and multistate models. Stat Med. 2006, 26: 23892430.
 15.
Savignoni A, Hajage D, TubertBitter P, De Rycke Y: Effect of an event occurring over time and confounded by health status: estimation and interpretation. a study based on survival data simulations with application on breast cancer. Stat Med. 2012, 31: 44444455. 10.1002/sim.5631.
 16.
Holt JD, Prentice RL: Survival analyses in twin studies and matched pair experiments. Biometrika. 1974, 61 (1): 1725. 10.1093/biomet/61.1.17.
 17.
Vaupel JW, Manton KG, Stallard E: The impact of the heterogeneity in individual frailty on the dynamics of mortality. Demography. 1979, 16 (1): 439454.
 18.
Sarpel U, Hefti M, Wisnievsky J, Roayaie S, Schwartz M, Labow D: Outcome for patients treated with laparoscopic versus open resection of hepatocellular carcinoma: casematched analysis. Ann Surg Oncol. 2009, 16: 15721577. 10.1245/s1043400904148.
 19.
Wei L, Lin D, Weissfeld L: Regression analysis of multivariate incomplete failure time data by modelling marginal distributions. J Am Stat Assoc. 1989, 84: 10651073. 10.1080/01621459.1989.10478873.
 20.
Lee E, Wei L, Amato D: Coxtype regression analysis for large numbers of small groups of correlated failure time observations. Survival Analysis: State of the Arts. Edited by: Klein JP, Goel PK. 1992, Netherlands: Springer, 237247.
 21.
Lin DY: Cox regression analysis of multivariate failure time data: the marginal approach. Stat Med. 1994, 13: 22332247. 10.1002/sim.4780132105.
 22.
Lu S, Shih J: Casecohort designs and analysis for clustered failure time data. Biometrics. 2006, 62: 11381148. 10.1111/j.15410420.2006.00584.x.
 23.
Woolson R, O’Gorman T: A comparison of several tests for censored paired data. Stat Med. 1992, 11: 193208. 10.1002/sim.4780110206.
 24.
O’Brien P, Fleming T: A paired prentice wilcoxon test for censored paired data. Biometrics. 1987, 43: 169180. 10.2307/2531957.
 25.
Manatunga A, Oakes D: Parametric analysis for matched pair survival data. Lifetime Data Anal. 1999, 5: 371387. 10.1023/A:1009692210273.
 26.
Mehrotra D, Su S, Li X: An efficient alternative to the stratified cox model analysis. Stat Med. 2012, 31: 18491856. 10.1002/sim.5327.
 27.
Katsahian S, Latouche A, Mary J, Chevret S, Porcher R: Practical methodology of metaanalysis of individual patient data using a survival outcome. Contemp Clin Trials. 2008, 29 (2): 220230. 10.1016/j.cct.2007.08.002.
 28.
R Core Team: R: A Language and Environment for Statistical Computing. 2013, Vienna: R Foundation for Statistical Computing, [http://www.Rproject.org]
Prepublication history
The prepublication history for this paper can be accessed here:http://www.biomedcentral.com/14712288/14/83/prepub
Acknowledgements
No source of funding was attributed to the authors or to the preparation of the manuscript.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interest.
Authors’ contributions
AS, CG, PTB and YDR conceived the research question. AS and CG conducted the literature review. AS drafted the manuscript, contributed to the interpretation of all the simulations results, and analyzed and interpreted the real data on pregnancy and breast cancer. PTB and YDR revised the manuscript and supervised AS at all stages. YDR oversaw the design, programmed and ran the simulation study. All authors read and approved the final manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Savignoni, A., Giard, C., TubertBitter, P. et al. Matching methods to create paired survival data based on an exposure occurring over time: a simulation study with application to breast cancer. BMC Med Res Methodol 14, 83 (2014). https://doi.org/10.1186/147122881483
Received:
Accepted:
Published:
Keywords
 Matching on timedependent covariates
 Matched timetoevent data
 Correlated survival data
 Event occurring over time
 Stratified Cox model
 Marginal Cox model
 Pregnancy
 Breast cancer
 Simulation study