Notation settings
A survival analysis notation is used as the focus is on a time-to-event end-point. Let Ti be the failure time and Ci the censoring time of subject i (i = 1…N) in a cohort (phase I) of size N followed-up to time τ. Ti and Ci are assumed to be independent, T ⊥ C, indicating a non-informative right censoring. Administrative censoring is set at the end of follow-up time τ. Let hi(t) be the hazard rate for the i th individual . The hazard function, modelled using the Cox proportional hazards model, is equal to hi(t) = h0(t) exp(βXi) where h0(t) is the baseline hazard, Xi the vector of the explanatory variables for individual i and β ’s the corresponding regression coefficients. The classical approach for estimating β is to maximize the partial likelihood [16]. Suppose that the biomarker of interest, i.e. XBM, is measured only for a subset n < N of subjects drawn from the phase I data and let ξi indicate whether subject i is selected into this subset. We will refer to the \( n={\sum}_{i=1}^N{\upxi}_i \) subjects as the phase II sample. Let πi = P(ξi = 1 ∣ Xi, ∆i, Zi) being the inclusion probability of subject i for the phase II sample, conditional on being selected at phase I. In a simple random sample this probability is equal for every subject (π = n/N). In a stratified sampling, the inclusion probability is common for all subjects in the same stratum and differs between strata. In particular, it is usually higher for the more informative strata (e.g. strata including subjects with the event of interest, as in case-control studies).
Simulation context
Phase I sample
To mimic a realistic context, we explore the variables that well represent the majority of data usually available in practice, even though in a simplified setting for simulation.
We hypothesized a cohort of subjects of size N (i.e. clinical trial cohort, register, clinical cohort) followed up to time τ, in which we aim to evaluate the prognostic value of a new biomarker (XBM) on a time-to-event end-point (T) in the presence of a possible confounder (XConf), a risk factor (XRisk Fact) and a possible auxiliary/surrogate variable (XSurr) of the marker of interest. To describe and illustrate relationships between these, a Directed Acyclic Graph (DAG) was displayed in Fig. 1. In particular, we assumed the confounder to have an impact on both the biomarker and the event of interest, the risk factor to be associated only with the event of interest, and, finally, the surrogate to be associated only with the biomarker.
Phase II sample
We assumed that the risk factor, the confounder and the surrogate variables are known for all subjects in the phase I (N), while the biomarker (XBM) is measured only on the subset of n individuals (phase II sample).
To sample the subset of subjects from phase I (N), a stratified two-phase sampling approach was used. Strata were defined using the following variables: event or event and risk factor or event and confounder or event and surrogate.
By note, in this work, we consider only sampling done at the end of the follow-up (τ) where subjects who developed the event during the follow-up are defined as cases and subjects event-free at time τ as controls.
The sample size of the phase II is fixed (n), but the sampling probabilities depend on different designs, as described below:
-
(i)
Simple Random Sample (SRS) in which all possible subsamples have an equal probability to be chosen.
-
(ii)
Probability Proportional to Size (PPS) is referred to a stratified sample with proportional allocation. The units are selected with probabilities proportional to stratum’s size. Thus, the size for each stratum in the phase II is given by the total size of the stratum in the original cohort multiplied by n/N [17].
-
(iii)
Case-Control (CC) is performed by separately sampling cases and controls [18]. As we aimed to compare different sampling strategies with a fixed sample size, we did not necessarily select all cases from the full cohort as often done. We fixed a total sample size (n) and selected an equal number of cases (n/2) and controls (n/2). We also considered stratified CC by using the variables available in phase I (see Fig. 1): separated simple random sampling was performed in each stratum. A balance design was considered [19].
-
(iv)
Nested case-control (NCC) can be considered as a particular case of case-control designs in which controls are randomly selected from the set of subjects event-free at the time of event occurrence on the cases [20,21,22]. Sampling probabilities for controls were derived by Samuelsen [23], while for cases they were equal to 1 if the phase II sample size n was at least twice the total number of events in the entire cohort (\( {\sum}_{i=1}^N{\Delta }_i\Big) \) and equal to \( {\pi}_i=\left(n/2\right)/\left({\sum}_{i=1}^N{\Delta }_i\right) \) otherwise.
-
(v)
Counter matching (CM) is an alternative stratified version of the NCC. In this design, the selection of controls is conducted by sampling from the set at risk in the opposite stratum at the time of event on the case. Inclusion probabilities for controls within strata were derived by Samuelsen [24] while for cases, πi was derived as in NCC design. As the aim is to maximize the “discordance” of exposure within case-controls sets [25,26,27], the variables used to define strata must be a proxy for the variables of interest, thus we used the surrogate variable XSurr for this design.
Figure 2 illustrates an example of each sampling design method described above. Specifically, in the upper part of the figure we displayed PPS and CC considering a stratification for a binary variable; in the lower part NCC and CM designs are displayed. By note, in NCC and CM designs we considered one control selected for each case.
Evaluation of biomarker impact on the event
The following Cox model was applied to assess the influence of the biomarker on hazard of the event adjusting for the confounder variable XConf (following the minimal set of adjustment suggested in Fig. 1):
$$ {h}_i(t)={h}_0(t)\ \exp \left({\beta}_{BM}{X_{BM}}_i+{\beta}_{Conf}{X_{Conf}}_i\right) $$
(1)
where βBM and βConf represent the regression coefficients of the biomarker and confounder, respectively. Given the availability of the biomarker only for the sub-cohort (phase II), we applied a weighted Cox model, in which regression coefficients are estimated by maximizing the partial likelihood weighted by the inverse of the empirical inclusion probability (\( {w}_i=\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{${\uppi}_i$}\right. \)) that accounts for the specific sampling design [6, 28, 29]. In SRS, CC and PPS designs [17, 30] empirical inclusion probabilities (πi) were calculated using a standard approach implemented in the “twophase function” in the survey package. Instead, πi ’s were calculated following Samuelsen [23] for NCC and following Rivera for CM [25].
As surrogate variables are rarely available for new biomarkers at the design stage, we considered also a setting with post-stratification for surrogate variables, mimicking a possible situation in which the surrogate variable is identified only after sampling, as this might still be advantageous [31]. In order to estimate this advantage, we performed a classical CC sampling design and then we ran a weighted Cox model post-stratifying for the surrogate variable [8].
Simulations parameters
The performance of the different designs was investigated through simulations. The number of simulations needed to guarantee robust results was calculated following Burton et al. [32]. It was set at B = 2000 assuming a level of accuracy equal to 0.0046 and a variance of XBM regression coefficient estimate equal to 0.011 with a 5% significance level. To generate the hypothetical cohort described above, for each scenario we drew B = 2000 random phase I samples of N = 2000 subjects.
We started by simulating the confounder variable as a dichotomous variable with P(XConf = 1) = 0.5; the biomarker was simulated by a binomial distribution with P(XBM = 1| XConf) = exp(a + b ∗ Xconf)/(1 + exp(a + b ∗ Xconf)) resulting in a prevalence in the entire cohort of nearly 25% (a = − 2 an b = 1.7) and ̴5% (a = − 4 and b = 1.5) for common and rare biomarker, respectively. The surrogate/auxiliary variable, with XBM as gold-standard, was simulated as P(XSurr = 1| XBM) = exp(c + d ∗ XBM)/(1 + exp(c + d ∗ XBM)). In order to cover different levels of accuracy of the surrogate in “predicting” the value of the biomarker, we set different values of parameters c and d (see the Additional file Table S1 for details) resulting in specificity (P(XSurr = 0| XBM = 0)) and sensitivity (P(XSurr = 1| XBM = 1)) values ranging between 70 to 90%. Finally, an additional binary risk factor XRisk factor was generated with a probability of P(XRisk factor = 1) = 0.4.
The time-to-event endpoint was generated [32, 33] from a Weibull hazard model as \( T={\left(- logU/\lambda \exp \left({\beta}^{\prime }X\right)\right)}^{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$p$}\right.} \), where p = 0.9, λ =0.1, with U following a uniform distribution on the interval from 0 to 1 and with the matrix of covariates X including the biomarker value (XBM), the risk factor (XRisk Factor) and the confounder (XConf). A random right censoring time was generated from an exponential distribution and three different censoring rates were considered (ρ equal to 0, 0.1, 0.4) to yield 0, 15 and 50% subjects censored at the end of follow-up time τ. Minimum between time-to-event Ti and censoring Ci (Zi = min(Ti, Ci)) was calculated, with ∆i = I(Ti < Ci). Administrative censoring was set at τ =2. This setting resulted in an average of 500 events for each phase I dataset at the end of follow-up. The values for the regression coefficients (β) and baseline hazard were chosen to mimic the observed values in ALL data [15, 34]. Details of all specific parameters were reported in the Additional file Table S1.
The sampling design scheme for the phase II (size n) was illustrated in the paragraph Phase II Sample in section Simulation context. In particular, we performed SRS, PPS and CC (the last two stratified by event or event and risk factor or event and confounder or event and surrogate) and, finally, NCC and CM.
Information on XBM was disregarded for subjects not included in the phase II sample and a weighted Cox model was applied to estimate βBM as described in Evaluation of Biomarker impact on the event section.
The performance of the estimate of βBM over the B simulations has been assessed by the following measures [32]:
-
(i)
Bias, given by = \( {\overline{\hat{\beta}}}_{BM}-{\beta}_{BM} \), where \( {\overline{\hat{\beta}}}_{BM} \) = \( \frac{\sum_{i=1}^B{\hat{\beta}}_{iBM}}{B} \),
-
(ii)
\( SE\left({\hat{\beta}}_{BM}\right), \) the empirical Standard Error (SE) of βBM over all simulations,
-
(iii)
Design effect, defined as the ratio between the estimated variance of βBM in each sampling design by the one in SRS [35],
-
(iv)
Mean Square Error, MSE, given by \( {\left({\overline{\hat{\beta}}}_{BM}-{\beta}_{BM}\right)}^2+{\left( SE\left({\hat{\beta}}_{BM}\right)\right)}^2 \),
-
(v)
Coverage of the 95% confidence interval (CI) of βBM and 95%CI length,
-
(vi)
Power, number of times in which the null hypothesis (βBM = 0) was rejected by the Wald test at 5% significance level in the weighted Cox regression model.
All analyses were performed using R software (version 3.5.2) [36].