Skip to main content

A stratified adaptive two-stage design with co-primary endpoints for phase II clinical oncology trials

Abstract

Background

Given the inherent challenges of conducting randomized phase III trials in older cancer patients, single-arm phase II trials which assess the feasibility of a treatment that has already been shown to be effective in a younger population may provide a compelling alternative. Such an approach would need to evaluate treatment feasibility based on a composite endpoint that combines multiple clinical dimensions and to stratify older patients as fit or frail to account for the heterogeneity of the study population to recommend an appropriate treatment approach. In this context, stratified adaptive two-stage designs for binary or composite endpoints, initially developed for biomarker studies, allow to include two subgroups whilst maintaining competitive statistical performances. In practice, heterogeneity may indeed affect more than one dimension and incorporating co-primary endpoints, which independently assess each individual clinical dimension, would therefore appear quite pertinent. The current paper presents a novel phase II design for co-primary endpoints which takes into account the heterogeneity of a population. 

Methods

We developed a stratified adaptive Bryant & Day design based on the Jones et al. and Parashar et al. algorithm. This two-stage design allows to jointly assess two dimensions (e.g. activity and toxicity) in two different subgroups. The operating characteristics of this new design were evaluated using examples and simulation comparisons with the Bryant & Day design in the context where the study population is stratified according to a pre-defined criterion.

Results

Simulation results demonstrated that the new design minimized the expected and maximum sample sizes as compared to parallel Bryant & Day designs (one in each subgroup), whilst controlling type I error rates and maintaining a competitive statistical power as well as a high probability of detecting heterogeneity.

Conclusions

In a heterogeneous population, this two-stage stratified adaptive phase II design provides a useful alternative to classical one and allows to identify a subgroup of interest without dramatically increasing sample size. As heterogeneity is not limited to older populations, this new design may also be relevant to other study populations such as children or adolescents and young adults or the development of targeted therapies based on a biomarker.

Peer Review reports

Background

The main objective of a phase II oncology trial is to assess the anti-tumoral activity of an experimental treatment. If promising results are obtained, the phase II is followed by a phase III trial to evaluate the effectiveness of an experimental treatment compared to a standard treatment. Older patients are vastly underrepresented in phase III clinical trials and the problem of recruiting older people has been largely documented in the literature. The most common barriers cited were: stringent eligibility criteria, oncologists concerns for toxicity, patients and family refusal [1]. Given the challenges of conducting randomized phase III trials in older patients, several authors have previously suggested conducting single-arm phase II trials to assess the feasibility of a treatment that has been shown to be effective in a younger population [2, 3]. Indeed, perhaps more importantly than in any other population, cancer care should not compromise quality of life or autonomy [2, 3]. Treatment feasibility can be evaluated with a composite endpoint combining multiple clinical dimensions (e.g. activity, toxicity, quality of life, etc.). The treatment may be considered feasible if it fulfills some or all components of the composite endpoint. Another conundrum is to take into account the heterogeneity of this population and stratifying older patients as fit or frail is crucial to recommend an appropriate treatment approach [4]. Classical phase II designs for binary or composite endpoints [5,6,7] do not deal with this heterogeneity and can lead to erroneous conclusions in an unselected population, while a specific subgroup of less frail (or less fit) patients might benefit (or not) from the new therapeutic. Stratified adaptive two-stage designs for binary or composite endpoints, which allow the inclusion of two subgroups and identify one of interest at the end of the first or the second stage, have recently been proposed [8,9,10]. Initially developed for biomarker studies, these types of approaches can also be applied to geriatric clinical oncology trials and allow to minimize the sample size whilst maintaining a competitive statistical performance that is comparable to classical approaches [11]. These stratified adaptive designs have been developed for binary or composite endpoints and they take into account the heterogeneity of a population when considering a single or combined clinical dimensions where each of them theoretically carries the same clinical importance. However, depending on the clinical context, the impact on autonomy or quality of life may take precedence over anti-tumoral activity in treatment decision-making. Moreover, interpretation may be difficult if there are divergent results for each clinical dimension separate. Thus, the use of co-primary endpoints that assess each clinical dimension independently appears more relevant in this light [12]. Several designs that deal with these types of endpoints have been proposed, but the most widely used is the one developed by Bryant and Day [13]. To the best of our knowledge the current literature does not include any reports of phase II designs for co-primary endpoints that account for heterogeneity. The current paper therefore details a stratified adaptive Bryant & Day (SABD) design based on the algorithm developed by Jones et al. [8] and Parashar et al. [10] (Methods section). The operating characteristics of the novel design are then evaluated using examples and simulation comparisons with the Bryant & Day (BD) design (Results section).

Methods

Bryant & Day (BD) design

The BD design can be considered as a two-stage Simon optimal design [6] which considers two dimensions as co-primary endpoints, namely activity and toxicity. The BD design, where XR1 and XT1 represent the number of responses and non-toxicities observed at the end of the first stage and XR and XT the total number of responses and non-toxicities observed at the end of the second stage, is shown in Fig. 1.

Fig. 1
figure 1

Bryant & Day (BD) design

After the inclusion of N1 patients, the study will be stopped for futility if an insufficient number of responses or non-toxicities are observed (i.e. XR1 < kR1 or XT1 < kT1). The experimental treatment will be considered as promising (i.e. «go-decision») if a sufficient number of responses and non-toxicities are observed in the interim (i.e. XR1 ≥ kR1 and XT1 ≥ kT1) and in the final (i.e. XR ≥ kR and XT ≥ kT) analysis.

Unacceptable and acceptable rates for each dimension are denoted as follows, with pR and pT respectively representing the response rate and the non-toxicity rate:

  • pR0: unacceptable response rate

  • pR1: acceptable response rate

  • pT0: unacceptable non-toxicity rate

  • pT1: acceptable non-toxicity rate

Given the two-dimensional nature of the endpoint, the null and alternative hypotheses are areas and defined by H0: {pR ≤ pR0 or pT ≤ pT0} and H1: {pR > pR0 and pT > pT0}, respectively. Four particular hypotheses corresponding to four possible states are considered:

  • H00: {pR = pR0 and pT = pT0}

  • H01: {pR = pR0 and pT = pT1}

  • H10: {pR = pR1 and pT = pT0}

  • H11: {pR = pR1 and pT = pT1}

There are four associated error rates:

  • α: is the probability of considering the treatment as promising in the case where true response and non-toxicity rates are considered as unacceptable (i.e. under H00),

  • αR: is the probability of considering the treatment as promising in the case where true response and non-toxicity rates are considered as unacceptable and acceptable, respectively (i.e. under H01),

  • αT: is the probability of considering the treatment as promising in the case where true response and non-toxicity rates are considered as acceptable and unacceptable, respectively (i.e. under H10),

  • β: is the probability of considering the treatment as insufficiently promising in the case where true response and non-toxicity rates are considered as acceptable (i.e. under H11).

Sample sizes of stage 1 and 2 (N1 and N2) and stopping boundaries (kR1, kT1, kR and kT) are determined from the specified values for pR0, pT0, pR1 and pT1 and the type I (αR and αT) and type II (β) error rates. The optimal design is defined as the one that minimizes the maximum expected sample size (ESS) under H10 or H01 (i.e. max{ESS under H10, ESS under H01}) whilst controlling for αR, αT and β.

Stratified Adaptive Bryant & Day (SABD) design

To take into account population heterogeneity, we developed a SABD design based on the Jones et al. [8] and Parashar et al. [10] algorithm. As compared to these designs that have been developed for binary or composite endpoints, this novel two-stage design allows to jointly assess two clinical dimensions (e.g. activity and toxicity) through co-primary endpoints in two different subgroups and to identify one of interest at the end of the first or the second stage. In the context of a geriatric clinical oncology trial for example, this allows patients to be stratified, according to a geriatric criterion, into frail and fit subgroups. To simplify the notation, these two subgroups will be defined as negative (‹‹-››) and positive (‹‹ + ››) subgroups respectively. The two-stage algorithm proposed by Jones et al. and Parashar et al., presented in Fig. 2, relies on an assumption of hierarchy between the subgroups as the true response and non-toxicity rates will always be equal or higher in the positive subgroup than in the negative subgroup. This implies that, according to the preliminary results observed at the end of the first stage, enrollment continues in an unselected population if promising results are observed in the negative subgroup, or in the positive subgroup (i.e. enrichment) if promising results are observed in this subgroup only.

Fig. 2
figure 2

Jones et al. and Parashar et al. algorithm

Based on this algorithm and adapted from the BD design to consider two co-primary endpoints, we proposed the SABD design presented in Fig. 3.

Fig. 3
figure 3

Stratified Adaptive Bryant & Day (SABD) design. e: ‹‹enrichment››. + : ‹‹inclusion of additional››. N1 and N1+: number of patients to be included at the first stage. N2, N2+ and N2e+: number of patients to be included at the second stage. N = N1 + N2, N+ = N1+ + N2+ and Ne+ = N1+ + N2e+: total number of patients to be included. kR1, kT1, kR1+ and kT1+: first stage stopping boundaries. kR, kT, kR+, kT+, kRe+ and kTe+: second stage stopping boundaries. XR1, XT1, XR1+ and XT1+: number of responses and non-toxicities observed during the first stage. XR2, XT2 XR2+, XT2+, XR2e+ and XT2e+: number of responses and non-toxicities observed during the second stage. XR = XR1 + XR2, XT = XT1 + XT2, XR+ = XR1+ + XR2+, XT+ = XT1+ + XT2+, XRe+ = XR1+ + XR2e+ and XTe+ = XT1+ + XT2e+: total number of responses and non-toxicities observed

The study begins with the inclusion of N1 and N1+ patients in the negative and positive subgroup, respectively. According to the results observed at the end of the first stage, enrollment will be stopped for futility if an insufficient number of responses or non-toxicities are observed in the two subgroups (i.e. (XR1 < kR1 or XT1 < kT1) and (XR1+  < kR1+ or XT1+  < kT1+)). Otherwise, enrollment will continue in the unselected population if a sufficient number of responses and non-toxicities are observed in the negative subgroup (i.e. XR1 ≥ kR1 and XT1 ≥ kT1). If a sufficient number of responses and non-toxicities are only observed in the positive subgroup (i.e. (XR1 < kR1 or XT1 < kT1) and (XR1+  ≥ kR1+ and XT1+  ≥ kT1+)) then enrollment will continue in this subgroup only. At the end of the second stage, the experimental treatment may be considered as promising (i.e. ‹‹go-decision››) in the two subgroups (i.e. S1) or in the positive subgroup only (i.e. S2 or S3).

Hypotheses

Similarly to the BD design, our SABD design assumes that the co-primary endpoints are independent in the two subgroups. If pR, pR+, pT and pT+ respectively correspond to the true response and non-toxicity rates in the negative and positive subgroups, the unacceptable and acceptable rates for each endpoint and subgroup may then be expressed as follows:

  • pR0 and pR0+: unacceptable response rates in the negative and positive subgroups,

  • pR1 and pR1+: acceptable response rates in the negative and positive subgroups,

  • pT0 and pT0+: unacceptable non-toxicity rates in the negative and positive subgroups,

  • pT1 and pT1+: acceptable non-toxicity rates in the negative and positive subgroups.

It is assumed that the null hypothesis is identical between subgroups for the co-primary endpoints (i.e. pR0 = pR0+ and pT0 = pT0+). Null and alternative hypotheses in both subgroups are therefore defined as follows:

  • H0−(+): {pR−(+) ≤ pR0−(+) or pT−(+) ≤ pT0−(+)}

  • H1−(+): {pR−(+) > pR0−(+) and pT−(+) > pT0−(+)}

Four particular hypotheses in both subgroups are considered:

  • H00−(+): {pR−(+) = pR0−(+) and pT−(+) = pT0−(+)}

  • H10−(+): {pR−(+) = pR1−(+) and pT−(+) = pT0−(+)}

  • H01−(+): {pR−(+) = pR0−(+) and pT−(+) = pT1−(+)}

  • H11−(+): {pR−(+) = pR1−(+) and pT−(+) = pT1−(+)}

Probability of rejecting null hypotheses

There are three possible scenarios where the experimental treatment is considered as promising in the unselected population or only in the positive subgroup (i.e. ‹‹go-decision››). These scenarios correspond to S1, S2 and S3 presented in Fig. 3.

The probability of considering the experimental treatment as promising in the unselected population (i.e. reject H0 and H0+) according to S1 is defined as:

$$P\left(S1\vert p_R^-,p_T^-\right)=P(X_{R1}^-+X_{R2}^-\geq k_{R\;}^-and\;X_{R1}^-\geq k_{R1}^-)\times P(X_{T1}^-+X_{T2}^-\geq k_T^-\;and\;X_{T1}^-\geq k_{T1}^-)$$

According to the hypothesis of hierarchy between subgroups, the probability of considering the experimental treatment as promising depends on the true response and non-toxicity rate in the negative subgroup only.

The probability of considering the experimental treatment as promising in the positive subgroup only (i.e. reject H0+) according to S2 is defined as:

$$P\left(S2\left|p_R^-,\;\right.p_T^-,\;p_R^+,\;p_T^+\right)=P\left(X_R^+\geq k_R^+\right)\times P\left(X_T^+\geq k_T^+\right)\times\left[P\left(\left(X_{R1}^-+X_{R2}^-<k_R^-\;and\;X_{R1}^-\geq k_{R1}^-\;and\;X_{T1}^-\geq k_{T1}^-\right)\;or\;\left(X_{T1}^-\;+\;X_{T2}^-<k_T^-\;and\;X_{R1}^-\geq k_{R1}^-\;and\;X_{T1}^-\geq k_{T1}^-\right)\right)\right]$$

The probability of considering the experimental treatment as promising in the positive subgroup only (i.e. reject H0+) according to S3 is defined as:

$$P\left(S3\vert p_R^-,p_T^-,p_R^+,p_T^+\right)=P\left(X_{R1}^++X_{R2e}^+\geq k_{Re}^+\;and\;X_{R1}^+\geq k_{R1}^+\right)\times P\left(X_{T1}^++X_{T2e}^+\geq k_{Te}^+\;and\;X_{T1}^+\geq k_{T1}^+\right)\times P(X_{R1}^-<k_{R1}^-\;or\;X_{T1}^-<k_{T1}^-)$$

To compute probabilities of rejecting null hypotheses, it is assumed that the number of responses and non-toxicities follow a binomial distribution, B(N,p), with parameters N and p defined in Table 1.

Table 1 Parameters of binomial distributions

Type I errors

Similarly to the BD design, three type I errors may be considered for the SABD design. The overall type I error rate α corresponds to the probability of considering the treatment as promising in the unselected population or in the positive subgroup in the case where true response and non-toxicity rates are considered as unacceptable in the two subgroups (i.e. under H00 and H00+). It is defined as:

$$\alpha =P\left(S1|{p}_{R0}^{-},{p}_{T0}^{-}\right)+P\left(S2|{p}_{R0}^{-},{p}_{T0}^{-},{p}_{R0}^{+},{p}_{T0}^{+}\right)+P\left(S3|{p}_{R0}^{-},{p}_{T0}^{-},{p}_{R0}^{+},{p}_{T0}^{+}\right)$$

Type I error rate αR corresponds to the probability of considering the treatment as promising in the unselected population or in the positive subgroup in the case where true response and non-toxicity rates are considered as unacceptable and acceptable, respectively, in the two subgroups (i.e. under H01 and H01+). It is defined as:

$${\alpha }_{R}=P\left(S1|{p}_{R0}^{-},{p}_{T1}^{-}\right)+P\left(S2|{p}_{R0}^{-},{p}_{T1}^{-},{p}_{R0}^{+},{p}_{T1}^{+}\right)+P\left(S3|{p}_{R0}^{-},{p}_{T1}^{-},{p}_{R0}^{+},{p}_{T1}^{+}\right)$$

Type I error rate αT corresponds to the probability of considering the treatment as promising in the unselected population or in the positive subgroup in the case where true response and non-toxicity rates are considered as acceptable and unacceptable, respectively, in the two subgroups (i.e. under H10 and H10+). It is defined as:

$${\alpha }_{T}=P\left(S1|{p}_{R1}^{-},{p}_{T0}^{-}\right)+P\left(S2|{p}_{R1}^{-},{p}_{T0}^{-},{p}_{R1}^{+},{p}_{T0}^{+}\right)+P\left(S3|{p}_{R1}^{-},{p}_{T0}^{-},{p}_{R1}^{+},{p}_{T0}^{+}\right)$$

Statistical power

The probability of considering the treatment as promising in the unselected population in the case where true response and non-toxicity rates are considered as acceptable in the negative subgroup, and therefore in the positive subgroup by the assumption of hierarchy, corresponds to P(S1|pR1,pT1). The probability of considering the treatment as promising in the positive subgroup in the case where true response and non-toxicity rates are considered as acceptable in the positive subgroup only corresponds to P(S2|pR0,pT0,pR1+,pT1+) + P(S3|pR0,pT0,pR1+,pT1+). As proposed by Parashar et al. [10], the overall power is defined by the minimum of these two probabilities:

$$power=1-\beta =\mathrm{min}\{P\left(S1|{p}_{R1}^{-},{p}_{T1}^{-}\right),P\left(S2|{p}_{R0}^{-},{p}_{T0}^{-},{p}_{R1}^{+},{p}_{T1}^{+}\right)+P\left(S3|{p}_{R0}^{-},{p}_{T0}^{-},{p}_{R1}^{+},{p}_{T1}^{+}\right)\}$$

Expected sample size (ESS) and optimal design

A minimum of N1 + N1+ patients need to be included. According to the number of responses and non-toxicities observed in the interim analysis, three scenarios are considered: none or N2 + N2+ or N2e+ additional patients will need to be included at the second stage. The ESS is determined as follows:

$$ESS\left({p}_{R}^{-},{p}_{T}^{-},{p}_{R}^{+},{p}_{T}^{+}\right)={N}_{1}^{-}+{N}_{1}^{+}+\left({N}_{2}^{-}+{N}_{2}^{+}\right)\times P\left({X}_{R1}^{-}\ge {k}_{R1}^{-}\right)\times P\left({X}_{T1}^{-}\ge {k}_{T1}^{-}\right)+{N}_{2e}^{+}\times P\left({X}_{R1}^{+}\ge {k}_{R1}^{+}\right)\times P\left({X}_{T1}^{+}\ge {k}_{T1}^{+}\right)\times P\left({X}_{R1}^{-}<{k}_{R1}^{-} \;or\; {X}_{T1}^{-}<{k}_{T1}^{-}\right)$$

As proposed by Parashar et al. [10], the optimal design (kR1, kT1, kR1+, kT1+, N1, N1+, kRe+, kTe+, Ne+, kR, kT, kR+, kT+, N, N+) is defined as the one that minimizes the maximum ESS under (H01,H01+) or (H10,H10+) (i.e. \(\mathrm{max}\{ESS\left({p}_{R0}^{-},{p}_{T1}^{-},{p}_{R0}^{+},{p}_{T1}^{+}\right),ESS\left({p}_{R1}^{-},{p}_{T0}^{-},{p}_{R1}^{+},{p}_{T0}^{+}\right)\}\)) while controlling type I (αR and αT) and type II (β) error rates. To determine the optimal design, 15 parameters need to be estimated. To reduce the computational burden, a similar approach to the one proposed by Jones et al. [8] is used. Parameters (N1, N, kR1, kT1, kR, kT) and (N1+, kR1+, kT1+) are derived from the BD design with (pR0, pT0, pR1, pT1, αR/2, αT/2, β) and (pR0+, pT0+, pR1+, pT1+, αR/2, αT/2, β), respectively (type I error rates are set at αR/2 and αT/2 to adjust for multiplicity). To delineate the parameter search space, the maximum sample size is set at 2 × N.

Probability of Early termination (PET)

The study will stop for futility if an insufficient number of responses or non-toxicities are observed in both groups in the interim analysis. The PET is determined as follows:

$$PET\left({p}_{R}^{-},{p}_{T}^{-},{p}_{R}^{+},{p}_{T}^{+}\right)=P\left({X}_{R1}^{-}<{k}_{R1}^{-} \;or\; {X}_{T1}^{-}<{k}_{T1}^{-}\right)\times P\left({X}_{R1}^{+}<{k}_{R1}^{+} \;or\; {X}_{T1}^{+}<{k}_{T1}^{+}\right)$$

Results

Examples of SABD design

Three examples of the SABD design are considered. In the first example, hypotheses are based on the GERICO10 phase II trial which aimed to evaluate the feasibility of a chemotherapy treatment with docetaxel-prednisone in patients age 75 or older, classified as vulnerable or frail according to the International Society of Geriatric Oncology criteria, with castration-resistant metastatic prostate cancer [14]. Same hypotheses are defined for the two co-primary endpoints in the two subgroups (pR0−(+) = pT0−(+) = 0.70 and pR1−(+) = pT1−(+) = 0.90). In the second example, different hypotheses are defined between the two co-primary endpoints in the two subgroups (pR0−(+) = 0.30, pT0−(+) = 0.60, pR1−(+) = 0.60 and pT1−(+) = 0.90). In the third example, different hypotheses are defined between the two co-primary endpoints and between the two subgroups for non-toxicity (pR0−(+) = 0.10, pT0−(+) = 0.60, pR1−(+) = 0.40, pT1 = 0.80 and pT1+  = 0.90). Type I error rates (αR and αT) and overall power (1-β) are set at 10% and 80%, respectively. The hypotheses, parameters and operating characteristics for the three examples are summarized in Table 2.

Table 2 Examples of stratified adaptive Bryant & Day (SABD) design (ESSRiTj and PETRiTj correspond to ESS (pRi,pTj,pRi+,pTj+) and PET (pRi,pTj,pRi+,pTj+), respectively)

In the first example, a maximum of 67 patients need to be included and the interim analysis is performed after the enrollment of 10 patients into each subgroup. According to the number of responses and non-toxicities observed at the end of the first stage, three scenarios are possible: the study is stopped for futility if at most 7 responses or non-toxicities are observed in the negative and positive subgroups; enrollment continues in an unselected population with the recruitment of additional 25 (N2 = N-N1) and 22 (N2+  = N+-N1+) patients in the negative and positive subgroups, respectively, if at least 8 responses and non-toxicites are observed in the negative subgroup; enrollment continues in the positive subgroup only with the recruitment of additional 25 patients (enrichment: N2e+  = Ne+-N1+) if at most 7 responses and non-toxicities are observed in the negative subgroup and at least 8 responses and non-toxicites are observed in the positive subgroup. At the end of the second stage after the enrollment of 35 and 32 patients in the negative and positive subgroups, respectively, a «go-decision» is declared in the unselected population or in the positive subgroup only if at least 29 responses and 27 non-toxicites are observed in the negative or in the positive subgroup only, respectively. After the enrollment of 35 patients in the positive subgroup (enrichment), a «go-decision» is declared in the positive subgroup only if at least 29 responses and 29 non-toxicites are observed. The ESS and the PET for insufficient activity and/or excessive toxicity equate to 42.5 patients and 41.5%, respectively.

In the second example, a maximum of 39 patients need to be included and the interim analysis is performed after the enrollment of 9 patients into each subgroup. According to the number of responses and non-toxicities observed at the end of the first stage, three scenarios are possible: the study is stopped for futility; enrollment continues in an unselected population with the recruitment of additional 14 (N2 = N-N1) and 7 (N2+  = N+-N1+) patients in the negative and positive subgroups, respectively; enrollment continues in the positive subgroup only with the recruitment of additional 12 patients (enrichment: N2e+  = Ne+-N1+). The ESS and the PET for insufficient activity and/or excessive toxicity equate to 25.7 patients and 55.4%, respectively.

In the third example, a maximum of 45 patients need to be included and the interim analysis is performed after the enrollment of 17 and 9 patients in the negative and positive subgroups, respectively. According to the number of responses and non-toxicities observed at the end of the first stage, three scenarios are possible: the study is stopped for futility; enrollment continues in an unselected population with the recruitment of additional 18 (N2 = N-N1) and 1 (N2+  = N+-N1+) patients in the negative and positive subgroups, respectively; enrollment continues in the positive subgroup only with the recruitment of additional 7 patients (enrichment: N2e+  = Ne+-N1+). The ESS and the PET for insufficient activity and/or excessive toxicity equate to 32.1 patients and 58.0%, respectively.

A selection of SABD designs with pre-specified hypotheses are detailed in Supplementary Table 1.

An optimal SABD design requires a total of 15 parameters to be estimated. This involves a very large number of combinations and therefore necessitates an extensive computational effort when using standard software. For example, the computation time needed to determine an optimal SABD design can vary from a few minutes or hours to several weeks, depending on the hypotheses, using R software (https://cran.r-project.org/).

Simulation studies

Simulations were carried out to investigate the operating characteristics of the SABD design and to compare to a parallel BD design (i.e. two parallel studies with one BD design in each subgroup). Three case studies corresponding to the three examples presented in previous section were considered. Type I error rate and power, for the SABD design, were set at 10% and 80%, respectively. In the parallel BD design, adjustment for multiplicity was performed to achieve an adequate overall type I error rate and sufficient statistical power to draw meaningful conclusions in the unselected population or only in the positive subgroup. Type I error rate and power were therefore set at 5% (i.e. αR/2 and αT/2) and 90% (i.e. 1—β/2) in each subgroup for parallel BD design, respectively. Four scenarios were considered:

  • Scenario 1A: simulations were performed under H01−(+) (pR−(+) = pR0−(+) and pT−(+) = pT1−(+)) to assess type I error rate αR and PET.

  • Scenario 1B: simulations were performed under H10−(+) (pR−(+) = pR1−(+) and pT−(+) = pT0−(+)) to assess type I error rate αT and PET.

  • Scenario 2: simulations were performed under H00 (pR = pR0 and pT = pT0) and H11+ (pR+  = pR1+ and pT+  = pT1+) to evaluate the probability of detecting heterogeneity at the first stage (i.e. stop enrollment for futility in the negative subgroup) and the probability of considering the treatment as promising in the positive subgroup (i.e. reject H0+).

  • Scenario 3: simulations were performed under H11−(+) (pR−(+) = pR1−(+) and pT−(+) = pT1−(+)) to evaluate the probability of considering the treatment as promising in the unselected population (i.e. reject H0 and H0+).

For each case study and scenario, 100 000 hypothetical trials were simulated. The number of responses and non-toxicities were randomly generated using binomial distributions B(N,p) with N corresponding to the number of patients presented in Table 2 (N1, N1+, NN1, N+N1+ and Ne+N1+) and p corresponding to the true response and non-toxicity rates defined above (pR, pT, pR+ and pT+).

The ESSs were also estimated for each case study and scenario. Simulation results are presented in Table 3.

Table 3 Simulation results

In all three case studies, the maximum sample size was larger with the parallel BD design with respectively 88, 62 and 79 patients compared with 67, 39 and 45 patients for the SABD.

Scenarios 1A and 1B, the SABD gave the smallest ESS with a maximum of 42.4, 25.7 and 32.0 patients compared to the parallel BD with a maximum of 52.8, 35.1 and 44.8 patients for the three case studies, respectively. The probability of rejecting H0 or H0+ (i.e. type I error rates αR and αT) was approximately 10% for each design and case study (except in scenarios 1A and 1B for case study 2 and 3 with the parallel BD, respectively). The PET varied between 41 and 46% for case study 1 and was higher when using the SABD with a minimum of 55.5% and 58.2% compared to the parallel BD with a minimum of 40.8% and 49.3% for the case studies 2 and 3, respectively.

In scenario 2, for each case study, the probability of rejecting H0+ is higher when using the parallel BD (approximately 90%) compared to the SABD (approximately 80%). The probability of detecting heterogeneity at the first stage was at least 80% for each design and case study, except for the SABD in case study 1 (73.9%). The SABD gave the smallest ESS with respectively 50.6, 31.0 and 34.9 patients compared to the parallel BD with 63.4, 42.4 and 45.7 patients for the three case studies.

In scenario 3, the probability of rejecting H0 and H0+ was approximately 80% for each design and case study. The ESS was lower for the three case studies when using the SABD, with 63.5, 37.4 and 43.5 patients compared to 85.1, 59.2 and 75.8 patients for the parallel BD, respectively.

Discussion

The stratified adaptive phase II design developed and presented in this paper takes into account the heterogeneity of a population when considering co-primary endpoints. The SABD design based on the Jones et al. [8] and Parashar et al. [10] algorithm, allows to include two pre-defined subgroups and to identify whether the therapeutic benefits one of these subgroups at the end of the first or the second stage. Different hypotheses can be defined between the subgroups and/or co-primary endpoints. We used three case studies to simulate different scenarios and investigate the operating characteristics of the SABD approach. The results demonstrate good statistical performances for the SABD when compared to the parallel BD (one BD for each subgroup). The SABD indeed allows to reduce the number of patients exposed to an insufficiently active or overly toxic treatment (scenarios 1A and 1B). The ESS required to reach an adequate statistical power to draw meaningful conclusions in the unselected population is also lower compared to the parallel BD (scenario 3). The same trend is observed in scenario 2 but the parallel BD yields a higher statistical power to conclude to the feasibility of the treatment in the positive subgroup only (i.e. «go-decision»). If there was heterogeneity between the two subgroups, the probability of detecting it at the first stage was generally at least 80%. To account for multiplicity and obtain an adequate overall type I error rate of 10%, αT and αR were set at 5% for each BD. In case study 2 and 3, optimal BD designs were determined using binomial probabilities with αT and αR less than 3.5%. This could explain the lower type I error rate observed in scenarios 1A and 1B for the parallel BD, compared to the SABD.

Given that the endpoint was two-dimensional, alternative case studies or scenarios may also be considered. It would, for instance, be interesting to investigate the statistical performance of the SABD design when heterogeneity only affects one dimension.

Similarly to the BD design, the SABD design assumes that the co-primary endpoints are independent. An alternative to the BD design which pre-defines the association between co-primary endpoints has also been developed [15]. Such an extension of the SABD design to correlated endpoints implies, among other things, to consider a bivariate binomial distribution with a correlation between the two co-primary endpoints but also between the two subgroups. This merits further investigation. A simulation study assessing the impact of an erroneous assumption of this pre-defined association however recommends using the BD design. Indeed, incorrectly assuming independence of endpoints only slightly increases the type I and II error rates. This is in contrast to wrongly defining the level of correlation between co-primary endpoints which results in a significant loss of statistical power and an increase in the type I error rate [16]. Future studies will be required to investigate the impact of wrongly assuming independence of co-primary endpoints on the performance of stratified design approaches.

The Jones et al. [8] and Parashar et al. [10] algorithm assumes that there is a hierarchy between subgroups, such that the true response and non-toxicity rate will always be higher in the positive subgroup. This may lead to the results of the positive subgroup having no impact on the outcome of the study if promising results are observed in the negative subgroup in the interim and the final analysis. Indeed, if this hierarchy assumption is incorrect, enrollment of an unselected population may be continued even though promising results are only observed in the negative subgroup at the interim analysis. In this scenario, an additional type I error may occur by declaring a «go-decision» in the unselected population in the case where true response and non-toxicity rates are considered as acceptable in the negative subgroup only (i.e. wrongly reject H0+). Zang & Yuan proposed a reverse approach to address this shortfall [17]. The trial is initially only conducted in the positive subgroup and then in the negative subgroup if promising results are observed in the positive subgroup. An alternative two-stage approach has also been published by Dutton & Holmes [18]. In this approach, futility is first tested in the unselected population and then in the negative or positive subgroup depending on whether or not promising results are observed. The impact of an assumption-based error in relation to hierarchy remains to be evaluated and deserves further investigation.

An optimal SABD design requires a total of 15 parameters to be estimated. A similar approach to the one proposed by Jones et al. [8], which is described in «Expected sample size (ESS) and optimal design» section, was used to reduce the number of parameters that needed to be determined and thus also reduce the computational burden. Further work is required to provide technical solutions and to determine optimal designs over the 15-dimensional parameter space.

Conclusions

The SABD design allows to independently assess two dimensions through co-primary endpoints in a heterogeneous population without dramatically increasing the sample size. This is particularly useful for geriatric clinical oncology trials as it allows to stratify the population according to a geriatric criterion and to identify a subgroup of interest that has an acceptable and clinically relevant benefit-risk ratio at the end of the first or the second stage. As population heterogeneity is not limited to older populations, the SABD design may also be applicable to other study populations such as children or adolescents and young adults [19]. Children populations are heterogeneous particularly in terms of age, with tolerance of a treatment potentially dependent on these aspects [20]. Our novel SABD approach may also be envisaged for phase II trials of targeted therapies based on a biomarker (positive versus negative) to select the appropriate study population for the subsequent phase III trial.

Availability of data and materials

The data generated and used during this study are available from the corresponding author on reasonable request.

The R program implementing the proposed SABD design is available from the corresponding author upon request.

Abbreviations

BD:

Bryant & Day

SABD:

Stratified Adaptive Bryant & Day

ESS:

Expected Sample Size

PET:

Probability of Early Termination

References

  1. Sedrak MS, Mohile SG, Sun V, Sun C-L, Chen BT, Li D, et al. Barriers to clinical trial enrollment of older adults with cancer: A qualitative study of the perceptions of community and academic oncologists. J Geriatr Oncol. 2020;11:327–34.

    Article  PubMed  Google Scholar 

  2. Wildiers H, Mauer M, Pallis A, Hurria A, Mohile SG, Luciani A, et al. End points and trial design in geriatric oncology research: a joint European organisation for research and treatment of cancer–Alliance for Clinical Trials in Oncology-International Society Of Geriatric Oncology position article. J Clin Oncol. 2013;31:3711–8.

    Article  PubMed  Google Scholar 

  3. Cabarrou B, Mourey L, Dalenc F, Balardy L, Kanoun D, Roché H, et al. Methodology of phase II clinical trials in metastatic elderly breast cancer: a literature review. Breast Cancer Res Treat. 2017. https://doi.org/10.1007/s10549-017-4278-5.

    Article  PubMed  Google Scholar 

  4. Ferrat E, Paillaud E, Caillet P, Laurent M, Tournigand C, Lagrange J-L, et al. Performance of Four Frailty Classifications in Older Patients With Cancer: Prospective Elderly Cancer Patients Cohort Study. J Clin Oncol. 2017;35:766–77.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Fleming TR. One-sample multiple testing procedure for phase II clinical trials. Biometrics. 1982;38:143–51.

  6. Simon R. Optimal two-stage designs for phase II clinical trials. Control Clin Trials. 1989;10:1–10.

    Article  CAS  PubMed  Google Scholar 

  7. A’Hern RP. Widening eligibility to phase II trials: constant arcsine difference phase II trials. Control Clin Trials. 2004;25:251–64.

    Article  PubMed  Google Scholar 

  8. Jones CL, Holmgren E. An adaptive Simon two-stage design for phase 2 studies of targeted therapies. Contemp Clin Trials. 2007;28:654–61.

    Article  PubMed  Google Scholar 

  9. Tournoux-Facon C, De Rycke Y, Tubert-Bitter P. How a new stratified adaptive phase II design could improve targeting population. Stat Med. 2011;30:1555–62.

    Article  PubMed  Google Scholar 

  10. Parashar D, Bowden J, Starr C, Wernisch L, Mander A. An optimal stratified Simon two-stage design. Pharm Stat. 2016;15:333–40.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Cabarrou B, Sfumato P, Mourey L, Leconte E, Balardy L, Martinez A, et al. Addressing heterogeneity in the design of phase II clinical trials in geriatric oncology. Eur J Cancer. 2018;103:120–6.

    Article  PubMed  Google Scholar 

  12. Sedrak MS, Freedman RA, Cohen HJ, Muss HB, Jatoi A, Klepin HD, et al. Older adult participation in cancer clinical trials: A systematic review of barriers and interventions. CA Cancer J Clin. 2021;71:78–92.

    Article  PubMed  Google Scholar 

  13. Bryant J, Day R. Incorporating toxicity considerations into the design of two-stage phase II clinical trials. Biometrics. 1995;51:1372–83.

    Article  CAS  PubMed  Google Scholar 

  14. Mourey L, Sevin E, Latorzeff I, Houede N, Meunier J, Priou F, et al. Is docetaxel-prednisone (DP) feasible in frail elderly (age 75 or older) patients with castration-resistant metastatic prostate cancer (CRMPC)? GERICO10-GETUG P03 trial: A trial from elderly and genitourinary oncology UNICANCER groups. JCO. 2012;30(5_suppl):93–93.

    Article  Google Scholar 

  15. Conaway MR, Petroni GR. Designs for phase II trials allowing for a trade-off between response and toxicity. Biometrics. 1996;52:1375–86.

    Article  CAS  PubMed  Google Scholar 

  16. Tournoux C, De Rycke Y, Médioni J, Asselain B. Methods of joint evaluation of efficacy and toxicity in phase II clinical trials. Contemp Clin Trials. 2007;28:514–24.

    Article  PubMed  Google Scholar 

  17. Zang Y, Yuan Y. Optimal sequential enrichment designs for phase II clinical trials. Stat Med. 2017;36:54–66.

    Article  PubMed  Google Scholar 

  18. Dutton P, Holmes J. Single arm two-stage studies: Improved designs for molecularly targeted agents. Pharm Stat. 2018;17:761–9.

    Article  CAS  PubMed  Google Scholar 

  19. Sposto R, Gaynon PS. An adjustment for patient heterogeneity in the design of two-stage phase II trials. Stat Med. 2009;28:2566–79.

    Article  PubMed  Google Scholar 

  20. Paoletti X, Geoerger B, Doz F, Baruchel A, Lokiec F, Le Tourneau C. A comparative analysis of paediatric dose-finding trials of molecularly targeted agent with adults’ trials. Eur J Cancer. 2013;49:2392–402.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors would like to thank ‘La Ligue Nationale Contre le Cancer, France’ (Comité des Pyrénées-Orientales, Comité de la Meuse, Comité du Maine-et-Loire) for their financial support and Petra Neufing, native speaker, for her assistance with the English proofreading.

Funding

This work was supported by a grant from ‘La Ligue Nationale Contre le Cancer, France’ (PI: Thomas Filleron). 

Author information

Authors and Affiliations

Authors

Contributions

BC, EL and TF developed the novel design and wrote the manuscript. PS and JMB reviewed the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Thomas Filleron.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

Supplementary Table 1. Stratified adaptive Bryant & Day (SABD) designs with αR = 0.1, αT = 0.1, β = 0.2 (ESSRiTj and PETRiTj correspond to ESS(pRi-,pTj-,pRi+,pTj+) and PET(pRi-,pTj-,pRi+,pTj+), respectively).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cabarrou, B., Leconte, E., Sfumato, P. et al. A stratified adaptive two-stage design with co-primary endpoints for phase II clinical oncology trials. BMC Med Res Methodol 22, 278 (2022). https://doi.org/10.1186/s12874-022-01748-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12874-022-01748-w

Keywords

  • Phase II clinical oncology trials
  • Heterogeneity
  • Adaptive stratified design
  • Co-primary endpoints