Matching methods to create paired survival data based on an exposure occurring over time: a simulation study with application to breast cancer

Background Paired survival data are often used in clinical research to assess the prognostic effect of an exposure. Matching generates correlated censored data expecting that the paired subjects just differ from the exposure. Creating pairs when the exposure is an event occurring over time could be tricky. We applied a commonly used method, Method 1, which creates pairs a posteriori and propose an alternative method, Method 2, which creates pairs in “real-time”. We used two semi-parametric models devoted to correlated censored data to estimate the average effect of the exposure HR¯(t): the Holt and Prentice (HP), and the Lee Wei and Amato (LWA) models. Contrary to the HP, the LWA allowed adjustment for the matching covariates (LWA a ) and for an interaction (LWA i ) between exposure and covariates (assimilated to prognostic profiles). The aim of our study was to compare the performances of each model according to the two matching methods. Methods Extensive simulations were conducted. We simulated cohort data sets on which we applied the two matching methods, the HP and the LWA. We used our conclusions to assess the prognostic effect of subsequent pregnancy after treatment for breast cancer in a female cohort treated and followed up in eight french hospitals. Results In terms of bias and RMSE, Method 2 performed better than Method 1 in designing the pairs, and LWA a was the best model for all the situations except when there was an interaction between exposure and covariates, for which LWA i was more appropriate. On our real data set, we found opposite effects of pregnancy according to the six prognostic profiles, but none were statistically significant. We probably lacked statistical power or reached the limits of our approach. The pairs’ censoring options chosen for combination Method 2 - LWA had to be compared with others. Conclusions Correlated censored data designing by Method 2 seemed to be the most pertinent method to create pairs, when the criterion, which characterized the pair, was an exposure occurring over time. In such a setting, the LWA was the most appropriate model.

http://www.biomedcentral.com/1471-2288/14/83 Background Five percents of breast cancers occur among women before the age of 40 and in fewer than 2% of women under 35 [1]. Physicians are more and more faced with issues of post-treatment pregnancy, such as the optimal time before considering conception. In physiopathological terms, a pregnancy after breast cancer is not advisable, especially in patients with positive hormonal receptors [2,3]. However, many observational studies report that pregnancy does not have any impact on the evolution of breast cancer [4][5][6], and could even have a long-term protective effect [2,[6][7][8][9][10]. This phenomenon can be due to the so-called "healthy mother bias" [7], i.e. only women who are in good health will undertake a pregnancy after treatment for breast cancer.
Pregnancy is therefore related to the prognostic condition of the patient, and its occurrence might change the prognostic effect of some factors, for instance, the hormonal receptors. The prognosis of positive hormonal receptors could be qualitatively (interaction) different before and after pregnancy (i.e. interaction between hormonal receptor and pregnancy).
In an attempt to control for this "healthy mother effect", various statistical approaches have been used. Some authors used the standard Cox model [11] with pregnancy considered as a covariate whose value depends on time, and adjusting for the known prognostic factors at the diagnosis of cancer, reflecting the gravity of the disease [2,6,8,9]. In spite of this adjustment for disease grade at diagnosis, pregnancy still remained a significant good prognostic factor, indicating a long-term protective effect of pregnancy, which is difficult to explain and to interpret. Others tackled the problem from the angle of an "illnessdeath" model [12][13][14] which makes it possible to describe the natural history of the disease, taking into account the prognostic profiles of the patients who had or did not have a pregnancy [15]. This model provided better understanding and interpretation of the effect of pregnancy by comparing transition probabilities to relapse between women who had or did not have a pregnancy. Moreover, taking into account a possible interaction between prognostic profiles and the pregnancy improved the estimation of the pregnancy prognostic effect. Overall prognosis was not adversely affected by subsequent pregnancy. However, to allow adjustment, such a complex model requires enough pregnancies and events of interest.
In both previous approaches, the adjustment relied only on known prognostic factors. However, measured prognostic factors might not be enough to characterize the prognostic status in such a disease. Matching subjects designs might be helpful to accomplish that.
Other researchers have conducted paired studies: pregnant and non-pregnant women were matched on the main known prognostic factors (hormonal receptor, proliferation level, nodal involvement, use of chemotherapy, year of diagnosis), and the non-pregnant had to be disease-free for as long as the time from diagnosis to pregnancy of the pregnant women [3][4][5]7,10]. By this matching carried out on known and measured factors, one can suppose that the subjects of the same pair also share non-observable, not observed or not measured factors, in addition to the factors of pairing. Thus, this design may improve the control of the "healthy mother effect" compared to the two approaches presented above.
However, to our knowledge, in such a case and contrary to the first methods cited previously, researchers [3][4][5]7,10] did not take into account the fact that pregnancy was an event occurring over time. They matched the pregnant woman to a non-pregnant one a posteriori, i.e. at the end of the follow up study, knowing which women were pregnant and which were not over the study period. They analyzed the data as if these pairs were a priori known and created at diagnosis, i.e. at time t = 0. Moreover, they always used the stratified Holt and Prentice semi-parametric model (HP) [16] to estimate the pregnancy prognostic effect, whereas other semiparametric models devoted to censored correlated data are available such as frailty models [17,18] and marginal models [19][20][21][22]. Frailty models model the time distribution conditionally to a random effect (frailty covariate), specific to each pair, and which is not observed. The structure of correlation has to be defined. The latter leave the nature of dependence among paired failure times completely unspecified.
Non-parametric [23,24] and parametric [25] approaches have been developed, but we focus on the semiparametric approach, more specifically on the commonly used marginal semi-parametric model. This marginal approach was developed by Wei, Lin and Weissfeld [19] to analyze subjects with multiple events, and then Lee, Wei and Amato [20] adapted it to clustered subjects.
In this paper, we use the marginal paired proportional hazards model of Lee, Wei and Amato (LWA) [20] and the Holt and Prentice stratified model [16] (HP). The main difference between them lies in the ability of the LWA [20] model to adjust for matching covariates and for the possible interaction between the covariate and the exposure, contrary to the HP model. Mehrotra et al. [26] proposed an efficient alternative to the stratified Cox model analysis to estimate the exposure effect, which does not require the assumption of a common hazard ratio across strata. However, that model is not adapted to our particular context of a large number of strata, with very small sample size per stratum (in our work, a stratum is a pair), thus it will not be studied here.
With HP and LWA models, we considered two different methods to create our pairs: the a posteriori one commonly used and described previously, and a new one http://www.biomedcentral.com/1471-2288/14/83 designing the pairs "in real-time" by taking into account the occurrence of the event over time, i.e. the pregnancy, which characterizes the subject's group within the pair.
The goal was to determine the combination between matching methods (a posteriori and in "real-time") and models (HP and LWA), which is the most efficient in terms of bias and Root Mean Square Error (RMSE) to estimate and test the pregnancy prognostic effect. In the following, pregnancy is referred to as the "exposure" occurring over time and the pairs are composed of an exposed and a non-exposed subject.
In the next section, the two matching methods as well as the models of analysis are presented. In Section 'Simulations' , extensive simulations are used to analyze the performance of these Method -Model combinations in term of bias and RMSE of the estimated effect related to the exposure occurring over time. In Section 'Real data application' , our findings are applied to real data in order to analyze the effect of a subsequent pregnancy on breast cancer evolution [9]. A brief discussion concludes the paper.

Methods
The subjects are matched on all the known prognostic factors represented by vector Z. In the following, index j corresponds to the rank of the exposed subject occurring at the jth position's order among the n e exposed subjects at the end of the study (j = 1, . . . , n e ), and index i corresponds to the subject among the n subjects of the study (i = 1, . . . , n). Let t i being defined as the follow-up time of each subject i, and t Ei as the time exposure for a subject i; for a subject who is never exposed, we consider t Ei = +∞. R m (t) represents the subjects at risk of event and nonexposed at time t for the method m (m = 1 or 2). Let the set of paired subjects being defined as P j = {j, i(j)}, with i(j) the subject i of R m t E j chosen for the jth exposed subject.

The two matching methods
The two matching methods applied in the context where exposure is an event occurring over time, are presented.
Method 1 matches the subjects as follows: all the subjects i exposed (t Ei < +∞) are matched with a subject who was not exposed during the whole period of followup. This method is called an a posteriori method because the groups exposed and non-exposed are defined at the end of the study and are considered to be known and created at time t = 0. According to Method 1, if a subject j undergoes an exposure at time t Ej , then the eligible subject's set R 1 t Ej of subjects i eligible to be matched to j could be written as follows: This approach has been commonly used in the literature [3][4][5]7,10] with exposure not being considered as time-dependent. As exposure is a time-dependent event, a second approach is proposed which designs the pairs "in realtime": according to Method 2, if a subject j undergoes an exposure at time t Ej , then the eligible subject's set R 2 t Ej of subjects i eligible to be matched to j at time t Ej , could be written as follows: R 2 t Ej = {i = j/t i ≥ t Ej AND t Ei > t Ej }. R 2 t Ej includes subjects at risk that are not yet exposed at t Ej and will be exposed after, and subjects who will never be exposed. A pair composed of an exposed subject and a non-exposed one who would never undergo the exposure is called a "perfect pair", whereas it would be an "imperfect pair". As a result, the non-exposed subject of an imperfect pair could be matched after exposure with another non-exposed subject.
Method 2 is similar to the one proposed by Lu et al. in case-cohort studies [22] where the membership in the exposed and unexposed groups is the outcome to be explained, whereas in our work it is an explanatory variable.
For both methods, exposed and not exposed subjects are matched according to the covariate vector Z, and the not exposed subject has to be disease-free for as long as the time from the starting point (disease diagnosis) to the exposure time.
For both methods, if several non-exposed subjects can be matched with an exposed one, the matched nonexposed subject is randomly chosen from the set of eligible subjects R m (t); if no non-exposed subjects are available, the exposed subject cannot be paired and is thus excluded from the analysis.
Note that even if a subject could belong to two different pairs with Method 2, these two pairs are independent while they are never at risk at the same time t.

The statistical models
In the following, λ i (t) is the instantaneous hazard function of outcome to be estimated for pair P j . It is noted λ i (t, Z i ) to specify that the estimation is made on the pair P j , which is composed of the exposed subject j and the non-exposed subject i matching on Z i . This notation is the same for all the models studied, even those where the adjustment for Z i is not available.
For all the models presented below, E i (t) corresponds to the time-dependent exposure status and is defined as follows: The pair of subjects is also defined by a time-dependent covariate: P i (t) = j if i ∈ P j and t ∈[ t Ej ; t i ] , or P i (t) = 0 otherwise.

Holt and Prentice stratified Cox model
Holt and Prentice [16] adapted the standard Cox model [11] to analyze matched paired data. http://www.biomedcentral.com/1471-2288/14/83 The instantaneous hazard function is written for each subject i as is a pair-specific baseline hazard function that is assumed to be identical for both subjects of pair P j , considered here as strata; it is considered as a nuisance parameter not to be estimated. The exposure effect exp (γ (t)) is then estimated, considering the between-pair heterogeneity, by allowing the instantaneous baseline hazard to be different within each pair. It is assumed to be identical across strata (no interaction between the exposure and the pairs) and thus to be implicitly common for the whole exposed population: exp (γ (t)) is defined as the population-weighted average of the stratum-specific hazard ratios. However, if this assumption is incorrect, i.e. in the presence of a true (and often undetected) interaction, using this model leads possibly to a biased and/or less powerful analysis [26].
Furthermore, with this model, estimation of the exposure effect cannot be adjusted for a possible interaction between the matching factors and the exposure. This stratified approach is sensitive to the unit number per strata and to the number of strata: the accuracy of the regression coefficients decreases for a small number of units per strata and/or many numbers of strata [27].
This model is implemented in R software [28] through the coxph function by including the term "strata(P i (t))" with the other explanatory covariates.

Lee, Wei and Amato Cox model
The marginal LWA model [20] is an alternative to the standard Cox model [11] and is written as follows if the exposure effect is not adjusted for the matching covariates vector Z; if the exposure effect is adjusted for the matching covariates vector Z; if the exposure effect is adjusted for the matching covariates vector Z, and for the interaction between Z and the exposure.
For each of these three LWA models, λ 0 (t) is an unspecified marginal baseline hazard function considered as common for all the pairs, so for the whole population. As above, it is considered as a nuisance parameter; exp (γ (t)) is the average time-varying exposure effect as in the HP model, but adjusted (LWA a ) or not (LWA u ) for covariates Z and for the possible interaction between covariates and exposure (LWA i ). Like the standard Cox model [11], the LWA assumes that all sample subjects are homogeneous (all subjects have the same λ 0 (t)) in spite of the possible adjustment for covariates (unique difference between LWA u and HP).
This model is implemented in R software through the coxph function, by including the term "cluster(P i (t))" with the other explanatory covariates.
For both models, the Proportional Hazard Assumption (PHA) was evaluated by Harrel's test on scaled Schoënfeld residues. This test is implemented in R software through the cox.zph function. The possible time-dependent effect of the exposure was taken into account by time intervals chosen a posteriori, and not by a time-specified function.
Note that the combination HP and Method 1, taking the exposure as a time-dependent covariate or not, gave exactly the same estimation of HR (t), whereas LWA did not. Table 1 presents all the models according to the adjustment or not for covariates.
In the following, different simulations and analyses were performed with R software version 2.13.0.

Simulations Objective
The main objective of the simulation study was to assess the ability of the HP and LWA models to estimate the true effect of exposure HR (t), defined by exp (γ (t)), in a context of matched paired survival data, where the pairs were designed according to the two different methods described previously. The aim was to establish the most efficient Method -Model combination.

Dataset
Simulation of cohort data -Procedures and scenarios chosen. All the details of the cohort data simulation and the procedures and scenarios chosen are given in Appendix A.
We simulated the cohort data referring to an "illnessdeath" model with transition intensities λ 12 (t), λ 13 (t) and λ 23 (t) (Figure 1). The parameter of interest HR(t) corresponded to the ratio λ 23 (t) /λ 13 (t). The average HR(t) is obtained from an exact formula involving the averages of λ 13 (t) and λ 23 (t) which are computed through a numerical approximation (transformation of the time from continuous to discrete values) (See the Appendix B). The average HR(t) adjusted for the different covariates was estimated empirically: its estimation was obtained using large size samples to guarantee good precision. Moreover, note that the larger the ratio λ 12 (t) /λ 13 (t), the larger the number of exposures in the simulated cohort.
Each subject was characterized by a prognostic profile through vector Z, which corresponded to three dummy http://www.biomedcentral.com/1471-2288/14/83

Exposure covariate
Models is the instantaneous risk function of event for the pair P j , which is composed of the exposed subject j and the non-exposed subject i matching on Z i .
covariates Z k (k = 1, 2 or 3). At time t = 0, there were 2 3 = 8 possible profiles of Z factors. The simulation model included (i) the choice of an instantaneous baseline risk function λ uv (t, Z) for each of the three transitions u → v (Table 2) and (iii) the choice for the censoring proportion.
For (i), an instantaneous average risk function λ uv t, Z = Z for each of the three transitions was simulated. Table 2 displays the λ uv t, Z = Z distributions of each transition used for each of the five different configurations of HR (t).
For (ii), ten different β uvk scenarios considered as plausible β uvk clinical values [9,15], were performed. Given the five configurations chosen for HR(t) and these ten β uvk scenarios, 50 different situations were obtained.
Finally, for (iii), these previous 50 situations were first performed without censoring. Two levels of independent uniform censoring were implemented only to the following β uvk scenario: 13 = −β 12 and β 23 = β 12 ; and they were applied to each of the five configurations of HR (t). This yielded to 10 more situations.
For each of the 60 situations, 1000 different data sets were generated with a sample size of 2000 subjects. At t = 0, these 2000 subjects were allocated to eight Z profiles. At t > 0, the 250 subjects of the 8 different profiles will be divided up in the three transitions and will change over time according to the five HR (t) configurations.
All theoretical values of HR (t) were calculated on the simulated cohort data. They were computed in the overall correlated censored data and inside each sample of the Z profile. The average HR (t) was calculated without and with adjustment for the matching covariates, i.e. HR (t) and HR a (t) respectively. HR i (t) is the one estimated in each Z profile.
From the correlated censored data, HP and LWA u models are both assumed to give an average HR(t) i.e. HR (t) (considering different assumptions), so they are the only

State 2 Pregnancy
State 3 Outcome λ 13 (t)   two models which could be compared. Fitting of model LWA a makes it possible to estimate an average HR(t) i.e. HR a (t), while LWA i is assumed to give HR i (t) for each Z profile ( Table 1).
As the exposure effect was considered to change over time for three of the five configurations, its estimation was assessed by time interval specified a posteriori.
Matching methods -Creation of censored correlated data from cohort data. For each data set, the two matching methods presented in Section 'Methods' were applied.
According to Methods 1 and 2, the two subjects in each pair were matched on the three covariates Z k , and the non-exposed subject had to be disease-free for as long as the time from t = 0 to exposure time of the exposed subject.
Then from the 2000 subjects simulated in cohort data sets and equally allocated to the 8 Z profiles, a number of pairs smaller or equal to the number of subjects in State 2 (i.e. pregnancy) were obtained. This latter depended on the situation simulated, resulting from the HR (t) configuration, the β uvk scenario and the censoring percent.
Statistical criteria used to compare the performances of the different estimators. To estimate a timedependent effect, the time interval [0 − t max ] was divided into L time intervals I l defined a priori, according to the HR(t) configuration, and written as follows: If there was no interaction and no time-dependent effect, an estimationγ s was obtained that corresponded to the estimation of γ performed in the s th simulation set inside the same HR(t) configuration.
If there was an interaction and no time-dependent effect, an estimation for each of the 8 Z prognostic profiles was obtained and expressed asγ s +α s Z.
If there was no interaction and a time-dependent effect, an estimation for each of the I l time intervals was obtained and expressed asγ sl .
If there was an interaction and a time-dependent effect, 8 × L estimations were obtained and expressed asγ sl + α sl Z.
To assess the combination Method -Model to estimate HR (t), the bias and the RMSE of the estimations presented above were calculated. The 8 Z profiles are indexed by w = 1 to 8, and the profile w is noted Z (w) .
In the event of with an interaction and a time-dependent effect, the bias was estimated for each l and each Z (w) over the 1000 simulations as follows: where, in each I l interval, γ l + α l Z is the mean and V γ l +α l Z the empirical variance of the 1000 parameters estimatedγ sl +α sl Z: To compare the different combinations Method -Model, the bias was averaged over the profiles (b l• ) or over the time intervals The associated RMSEs are given by Note that if the exposure effect is not time-dependent, then L = 1. If there is no interaction between the exposure and the prognostic profiles, then α = 0.
Number of pairs. Inside each profile, the maximum number of pairs was determined by the number of exposed subjects. With Method 1, this number was also limited by the number of "perfect" non-exposed subjects, but not with Method 2 since the non-exposed subject set was composed of "perfect" and "imperfect" non-exposed subjects. The difference between the number of pairs from the two methods depended on the number of exposures: the larger the number, the larger the difference. We computed the relative difference (RD) in number of pairs between the two methods defined as RD = Number of pairs with Method 2 − Number of pairs with Method 1 Number of pairs with Method 1 , whose median was equal to +55% (range, +23% to +220%). Figure 3 represents the distribution of the number of pairs according to the profiles and to the matching methods: the median number with Method 2 was always larger than or equal to that with Method 1. Figures 4 shows the number of subjects pertaining to the three possible subjects groups at each time t: the exposed subject (solid green line), the non-exposed subject who never will undergo the exposure (solid red line) and the non-exposed subject who will undergo the exposure (solid blue line). The dotted vertical green line represents the time of first exposure, i.e. the time of occurrence of the first pair; the dotted vertical blue line corresponds to the time of the last perfect pair's creation and the dotted vertical red line corresponds to the time of the last imperfect pair's creation. With Method 2, the larger the ratio λ 12 (t) /λ 13 (t), the larger the number of imperfect pairs and thus the greater the probability for an exposed subject to belong to an imperfect pair. Table 3 provides the proportion of imperfect pairs among the whole pairs which was estimated over the 1000 simulated data sets of our particular situation. It was equal to 81% in the good profile Z = (0, 0, 0), decreasing to 54%, 44% and 20% in the Z profiles (1, 1, 0), (0, 0, 1) and (1, 1, 1), respectively. Moreover, the larger the ratio λ 12 (t) /λ 13 (t), the faster the pair's creation stopped. After the last dotted line (blue or red, depending on the Z profile), the exposed subjects are no longer able to be matched with a non-exposed one, because they are no longer available subjects; the pair's creation stopped at a time that gradually increased from Z = (0, 0, 0) to Z = (1, 1, 1). For instance, this is illustrated for profile Z = (0, 0, 0) in Figure 4A: at the time of first exposure (dotted vertical green line), the exposed subject was more likely to be matched with an imperfect subject than to a perfect non-exposed one. This set of non-exposed subjects decreased over time because each of them was matched with an exposed subject until the dotted blue line, when no more non-exposed subjects were available, while a new exposed subject, belonging before to a pair as a non-exposed one, appeared. This set of exposed subjects increased and was not able to be matched because there were no longer any non-exposed subjects. For a same level of λ 12 (t) /λ 13 (t) ratio, RD was http://www.biomedcentral.com/1471-2288/14/83  Perfect non−exposed Imperfect non−exposed Exposed Perfect non−exposed Imperfect non−exposed Exposed Perfect non−exposed Imperfect non−exposed Exposed Perfect non−exposed Imperfect non−exposed Exposed D Figure 4 Number of subjects in the three possible groups. Number of subjects pertaining to the three possible subjects groups at each time t: the exposed subject (solid green line), the non-exposed subject who never will undergo the exposure (solid red line) and the non-exposed subject who will undergo the exposure (solid blue line). The larger depending on the censoring percent: the higher the censoring percent, the smaller the RD.

Bias and RMSE. Figures 5 displays the distribution of
the bias and the RMSE of the exposure effect estimator, according to our four models. With Method 1, the estimation of HR (t) was biased with HP and LWA u , but more with HP; the estimation of HR a (t) was biased with LWA a . Whatever the model applied, HR (t) (HP, LWA u ) and HR a (t) (LWA a ) were underestimated, i.e. a poor prognostic exposure effect tended to be ignored and a no exposure effect tended to become a protective one. In each Z profile, LWA i always largely underestimated HR i (t) (  Table 3.

With Method 2 where HR (t)
HR a (t), LWA u and LWA a gave very similar values, whereas HP gave a smaller estimation of HR (t) than LWA u . All these three models are very close to the true value of the effect of exposure, but none gave an unbiased estimation: this estimation was slightly over-or under-estimated according to the time, but much less than with Method 1. HR i (t) was overestimated in some Z profiles: HR i (t) was overestimated in the three PP, Z ∈ {(0, 0, 0) , (1, 0, 0) , (0, 1, 0)}, where the ratio λ 12 (t) λ 13 (t) was large and then the proportion of imperfect pairs in the set of pairs to be analyzed, was also large (

Prognostic profiles Z (w)
% of exposure Method 1 The percent of exposure inside each prognostic profile is also given. http://www.biomedcentral.com/1471-2288/14/83     to the PP: the larger the percentage of imperfect pairs, the larger the bias and RMSE.

Mean of pairs number % of imperfect pairs
All the conclusions displayed with Method 1 were the same for all five configurations and all the β uvk triplet values. The biases of HR (t), HR a (t) and HR i (t) were always huge (data not shown) but more or less followed the configuration and the β uvk scenarios. All the conclusions with Method 2 were valid for all configurations, and for all β uvk triplet values. In the configurations without interaction, i.e. where β 23 = β 13 , HP and LWA u models were more appropriate than LWA a and LWA i to estimate HR (t) in terms of bias and RMSE, given that LWA u was much less biased than HP in most of the scenarios. In some configurations where the proportion of the profile with the smallest HR i (t) was the most highly represented profile leading to a low HR (t), HP was better than LWA u . LWA a was the only model for estimating HR a (t) and led to very slightly biased estimations of HR a (t) (data not shown). In the configurations with interaction, i.e. where β 23 = β 13 , the LWA i model was the only appropriate one. However, in some of these configurations, LWA i slightly biased the extreme profiles and not the intermediate ones (data not shown). Over the 10 situations with censoring, the censoring percent ranged from 9% to 48%. Censoring did not change any of the previous conclusions.
Overall, in terms of bias and RMSE, Method 2 performed better than Method 1 to design the pairs, and LWA a was the best model for all the situations except when there was an interaction between the covariates and the exposure (β 23 = β 13 ), for which LWA i was more appropriate, even if the estimations with HR i (t) were not uniformly unbiased. http://www.biomedcentral.com/1471-2288/14/83

Real data application Data
Our data retrospectively included 870 women treated for breast cancer between January 1990 and December 1999, and diagnosed before 35 years of age. Information on patients' status was collected at the end of the year 2004 and the median follow-up was 87 months (range, 7 to 166) at this date. Tumor, treatment and disease evolution analyses are available in the paper of Largillier et al. [9]. One of the goals of the data analysis was to compare disease-free survival between pregnant and non-pregnant patients. The protocol was submitted to the appropriate French authorities supervising individual computerized data files (Commission Nationale Informatique et Liberté [CNIL]), and obtained ethical approval from the Institut Curie ethics committee (Comité des Etudes en Recherche Clinique [CERC]). The GETNA Working Group, which was responsible for the data conception, design and acquisition, allowed us access to these real data.
We created pairs of pregnant and non-pregnant women using the two matching methods. Both women were matched according to their cancer gravity level using: Scarff-Bloom-Richardson grade (SBR, related to cell proliferation level), pathological node involvement and hormonal receptor status (RH). They were also matched on treatment: hormonotherapy prescribed or not to RH+ patients, and chemotherapy administered or not before and/or after surgery. Within each pair, the non-pregnant woman had to be disease-free for as long as the time from diagnosis to pregnancy of the pregnant woman. We sought to estimate the effect of subsequent pregnancy occurring over time on breast cancer evolution.
According to the results obtained on the cohort data [15] and to the known breast cancer clinical prognostic factors in the literature, patients with low cell proliferation level (SBR I or II) and no node involvement were considered to have a good prognostic profile and likely to plan to be pregnant ("healthy mother effect"). According to biological assumptions and need for therapy (hormonal treatment if RH+), women with negative hormonal receptors were also more likely to be pregnant. Thus, even if these factors were not significant in the paper by Savignoni et al. [15], it seemed relevant to estimate the effect of pregnancy according to the RH status (and the treatment associated) and according to a clinical Prognostic Profile (cPP); a poor cPP was defined as an SBR grade III and/or pathological involved nodes, leading to six prognostic profiles: RH negative with a good cPP, RH negative with a poor cPP, RH positive not treated with a good cPP, RH positive not treated with a poor cPP, RH positive treated with a good cPP and RH positive treated with a poor cPP. Then the effect of the pregnancy was estimated in the whole population by adjusting or not for the matching factors and with regard to the six prognostic profiles by adjusting for an interaction between the pregnancy, and the RH status (associated with treatment) and the cPP respectively. The effect of pregnancy was not adjusted for chemotherapy treatment.

Results
In view of our simulations, Method 2 was the matching method to apply on the cohort data in order to create correlated censored data that would give a more accurate estimate of the exposure effect. To be able to compare our results with previous findings [3][4][5]7,10], we used the HP model on correlated censored data designed from Method 1.
First, we applied the Method 1-HP combination as proposed in the literature [3][4][5]7,10]. Secondly, we applied the HP and LWA with pregnancy and pair as timedependent covariates on the paired survival data created with Method 2. With LWA, HR (t) estimation was carried out (i) without adjusting for matching covariates (LWA u estimates HR (t)), (ii) by adjusting for all the matching covariates but without any interaction (LWA a estimates HR a (t)) and (iii) by adjusting for all the matching covariates and with an interaction between pregnancy and matching covariates (LWA i estimates HR i (t)). The latter could be applied in the event of real or assumed biological and clinical interactions. We tested the PHA with Harrel's test on the two correlated censored data sets and with each model. The PHA was verified and we did not add any time-effect for pregnancy in the models. HR(t) was compared to 1 by the Wald's test with the appropriate variance according to the models.
In the cohort data, only 668 patients presented no missing data for the covariates of interest. Among them, 68 experienced a subsequent pregnancy. We obtained the maximum number of pairs available with Methods 1 and 2, i.e. 68 pairs: all pregnant women were then matched. The 68 pairs were not exactly the same between Methods 1 and 2. In Method 2 five pairs were imperfect, which represented a low proportion (7.4%). Among the 68 pairs obtained with Method 1, 32 women experienced an event (progression or death): 16 in the group of patients who became pregnant after the breast cancer diagnosis and 16 in the group of patients who did not. Among the 68 pairs obtained with Method 2, 29 women experienced an event: the same 16 in the group of patients who became pregnant after the breast cancer diagnosis and 13 in the group of patients who did not. No events occurred in the imperfect pairs. Only sixteen pairs (23.5%) were common between the two matching methods. The number of pairs and the number of final events in the pregnancy and nonpregnancy groups are given Table 4, according to the six profiles and to the matching methods. The number of pairs was divided into imperfect and perfect pairs for Method 2. http://www.biomedcentral.com/1471-2288/14/83 The number of imperfect pairs is also given for Method 2.
The HP model applied with Method 1 yielded the following estimations: exp (γ ) = 0.90, CI 95% [0.36 − 2.21], not significant. Kranick et al. [5] and Veletgas et al. [4] concluded likewise as did Azim et al. [3], but the latter stratified their analysis on estrogen status. Other authors [7,10] found a protective role of pregnancy. All these authors used the HP model with Method 1 adjusted for the other known prognostic factors. In view of our simulation results, we believe that this estimation was underestimated and that the models should be used on censored correlated data created with Method 2. Regarding the latter, the HP model yielded a larger value than with Method 1 but was still not significant (exp (γ ) = 2.00, CI 95% [0.75 − 5.33]). In view of our simulations, we conclude that HR (t) was widely underestimated with HP using Method 1 and only slightly underestimated with HP using Method 2. The increasing value of HR (t) estimated by HP between the two matching methods was not surprising. Method 2 and the LWA model, with or without adjusting for matching covariates (LWA u and LWA a ), yielded similar and not statistically significant results: exp (γ ) = 1.22, CI 95% [0.61−2.42] and exp (γ ) = 1.24, CI 95% [0.62−2.45], respectively. The difference in HR (t) values estimated by the HP and LWA u models was not statistically significant. In such a context, we would like to estimate the proper effect of the subsequent pregnancy. It is more relevant to estimate HR a (t) than HR (t). Even if LWA i showed no significant interaction between the matching covariates and the pregnancy, because of the biological and clinical assumptions, we estimated the effect of pregnancy according to the six PP defined above using the LWA i model. Table 5 presents exp (γ ) estimations according to the models and their 95% confidence intervals.
The apparent protective role of pregnancy for the theoretically less favorable prognostic profile "RH+ (treated or not) and poor cPP" and its increasing role in risk for the theoretically best prognostic profile "RH− and good cPP" were surprising. There was no statistical significance, but it could be because of a lack of power of the combination Method 2 -LWA i .

Discussion
In our context of exposure occurring over time, we focused on two matching methods: Method 1, commonly used in the literature [3][4][5]7,10], and Method 2, our new approach. Method 1 composes the pairs in an a posteriori way where the pairs are in fact considered to be known at diagnostic time t = 0. Method 2 creates pairs in "real-time". Such matching designs create independence between pairs but dependence between the subjects of the same pair, and specific analytical methods exist for such a situation of correlated censored data. In our work we studied two semi-parametric models allowing for the stratified Holt and Prentice model [16] (HP) and the Lee, Wei and Amato model [20] (LWA). To estimate the average exposure effect HR (t) and unlike the LWA, the HP did not make it possible to take into account either the matching covariates or the possible interaction between the matching covariates and the exposure.
The aim of this study was to analyze the HP and LWA models using the two matching methods in order to propose the most efficient Method -Model combination to estimate and test the prognostic exposure effect HR (t) estimated through the models by exp (γ (t)).
In view of our simulations, the relative difference in the number of pairs between Method 1 and Method 2 depends on the ratio λ 12 (t) /λ 13 (t) and on the censoring percent. Compared to Method 1, the number of pairs was equal or larger with Method 2. In terms of bias and RMSE, Method 2 is more relevant than Method 1 to design the pairs, and LWA a is the best model for all the situations except when there is an interaction between the covariates and the exposure (β 23 = β 13 ), for which LWA i is more appropriate even if the HR i (t) estimations are not uniformly unbiased.
In our sample data, we applied Method 1 and HP to compare our results with those in the literature [3][4][5]7,10]. According to the simulation results, we applied Method 2 with HP and LWA. With both matching methods, we obtained an equal number of pairs (the maximum available) but not the same ones. HP used with Method 1 http://www.biomedcentral.com/1471-2288/14/83 The estimation of the pregnancy effect standard error (se) by the HP and LWA models are given.
gave the smallest estimation of HR (t). HR (t) estimations were not statistically different from 1 with HP, LWA u and LWA a . According to biological and clinical hypothesis, we assumed that the effect of exposure was different according to the six prognostic profiles so we used the LWA i model. However, we did not conclude that pregnancy had neither a protective effect nor an adverse effect on progression disease, even for women with positive hormonal receptors and poor clinical profile at diagnosis. Estimating HR (t) by prognostic profiles using the combination Method 2 -LWA i , showed a different and opposite effect of exposure in the six health profiles, but none was statistically significant. The sample size of the profiles was small and a few events occurred inside each profile, probably leading to a lack of power of the combination Method 2 -LWA i . Adjusting for the matching covariates and for their possible interaction with exposure could be very interesting to interpret the effect of exposure HR (t). However, it could be more difficult with real data, especially in the event of multiple prognostic profiles. The sample had to be large enough to enable us to assess the performances of the models according to the two matching methods, especially LWA a and LWA i , which required a large number of pairs. The erroneous estimation with LWA i in some extreme profiles could partially be explained as follows: in the "imperfect" pairs, we artificially prevent the nonexposed subject from having a final event, because she will first become an exposed subject of another pair. Maybe she will undergo the final event but as an exposed subject and in another pair. Then, when the proportion of imperfect pairs is large, HR i (t) could be overestimated. Moreover the larger the ratio λ 12 (t) λ 13 (t), the larger the number of imperfect pairs, and the higher the overestimation of HR i (t) and the faster its appearance. Owing to the values of the β uvk triplets chosen in the particular configuration presented above, the subjects of the three profiles Z ∈ {(0, 0, 0) , (1, 0, 0) , (0, 1, 0)} experienced more exposure before any events, and simultaneously fewer events (before or after exposure), than the other PP. In these three profiles, the number of events in the exposed and in the non-exposed individuals is quite low, suggesting a possible lack of accuracy in the estimation of HR (t). All the models needed enough pairs and enough final events to fit: the larger the number of imperfect pairs and the smaller the number of events, the worse the accuracy of the estimation of HR i (t). In a future study, we will explore more deeply this problem of bias related to the percent of imperfect pairs in the sample.
Technically, whatever the matching method used, HP censored the pair when one of its subjects was censored or experienced the final event, whereas the LWA did not. The latter leaves each subject to be followed up to her censoring or final event, whatever the outcome of the matched subject. Concerning the management of the imperfect pair obtained with Method 2, the non-exposed subject i of pair P j was censored when she became the exposed subject i of another pair P i at the time t E i ; by construction, as said before, her exposure occurred before any other events. For LWA, an alternative would be to censor the pair P j . Another alternative, whatever the model, would be to propose another non-exposed (perfect or imperfect) subject to the exposed subject of pair P j , who was now single in her pair. This issue deserves more investigation.
Using time intervals to estimate the effect of exposure as it changes over time was maybe not the most pertinent method, because it required many parameters. Therefore, to improve fitting, we intend to apply splines to fit the effect of exposure as it changes over time. Indeed, the present study sought to compare results of two known models devoted to censored correlated data, and the wellknown frailty model was set aside because it requires the structure of correlation within the pairs to be specified. The next step would be to compare the HP, LWA and the frailty model using Method 2.

Conclusions
In conclusion, correlated censored data designing by Method 2 seems to be the more pertinent method to create pairs when the criterion which characterizes the pair is an exposure occurring over time. It would be interesting to estimate the proper effect of the subsequent pregnancy. It is then more pertinent to estimate HR a (t) than HR (t). Thus, LWA a seems to be the best model for all the situations, except when there is an interaction between the covariates and the exposure, for which LWA i is more appropriate, even if the estimations of HR i (t) are not uniformly unbiased. LWA a and LWA i gave a more accurate and relevant estimation of the effect of exposure in particular context, where we can reasonably suppose that the latter depends on prognostic profiles. http://www.biomedcentral.com/1471-2288/14/83 Finally, for (iii), these previous 50 situations were first performed without censoring. To minimize simulations time, two levels of independent uniform censoring were implemented only with the following β uvk scenario: β 12 = (−0.2, −0.4, −0.8), β 13 = −β 12 and β 23 = β 12 ; and they were applied to each of the five configurations of HR (t). This yielded to 10 more situations (five HR (t) configurations with2 levels of censoring) for that β uvk scenario. The maximal event time t max was set at 1000. The first uniform distribution for censoring time C was over the interval time [0; t max ], and the second one over [0; 2t max ]; then the maximal censoring time was C max = +∞, t max or 2t max . The overall censoring level was higher in the first censoring distribution but it also depended on the HR (t) configuration. In total we had50 situations without censoring and 10 with censoring (the same five configurations with the two levels of censoring).
For each of the 60 situations, 1000 different data sets were generated with a sample size of 2000 subjects. At t = 0, these 2000 subjects were allocated to the eight Z profiles. At t > 0, the 250 subjects of the 8 different profiles are divided up among the three transitions and change over time according to the five HR (t) configurations.
. The computation of N z (t) is the following. In transition 1 → 3, a subject is at risk if it does not have any event and if it is not censored where • S 12 z (t) is the survival rate in the transition 1 → 2 for the profile z at time t, • S 13 z (t) is the survival rate in the transition1 → 3 for the profil z at time t, • N 13 z is the number of subject at risk in the profile z at time t = 0, •Ḡ (t) = 1 − G (t) with G(t) is the repartition function of the variable C characterizing the censoring time. It does not depend on the profile.
In transition 2 → 3, taking into account the left truncation, a subject is at risk of event in this transition, if it entered in the transition and if it had neither an event nor being censored. N 23 z could be expressed as Then where • λ 12 z (t) is the instantaneous hasard function of transition 1 → 2 for the profil z at time t, • g (t) is the density function of the variable C characterizing the censored time. It does not depend on the profile.

Appendix C: Empirical estimation of the average exposure effect adjusted for the covariates HR a (t)
Let's consider L intervals I l (l = 1 to L) inside the study interval [0; T max ], where all the i subjects are censored at time T max : I l = a l−1 ; a l : l = 1, · · · , L with a 0 = 0 < a 1 < a 2 < · · · < a L = T max .
Inside each I l , the exposure variable E l (t) is as follows: E l (t) = 1 if t ≥ t Ei and t ∈ I l 0 otherwise.
Then we define: E (t) = L l=1 E l (t) . Our model of simulation obtained from an "illnessdeath" model is as follows: where λ(t, Z, E) is the instantaneous risk function of state 3 occurrence according to the exposure E and the covariates Z k (k = 1 to 3).
From this model, we are able to calculate HR i (t), i.e. the HR(t) for each profile (Z 1 , Z 2 , Z 3 ): We approximate HR a (t) by where the γ l are obtained from We approximate HR(t) by where the γ l are obtained from For HR a (t), the estimation of γ l is adjusted for the β k , whereas for HR(t) it is not.
The approximation of HR i (t), HR a (t) and HR(t) is especially relevant when L and the total number of patients are large, in order to have enough exposure and events inside the transitions 2 → 3 and 1 → 3.
The approximations of HR i (t) and HR (t) could be compared to the theoretical values (See Appendix B), not of HR a (t).
To be sure of the approximation of HR a (t), we simulated data from a Cox proportional hazards model with a time-dependent covariate representing exposure. As we obtained directly the effect of the exposure we were able to certify that our approximation's method was correct.