 Research article
 Open Access
 Published:
Optimal designs for phase II/III drug development programs including methods for discounting of phase II results
BMC Medical Research Methodology volume 20, Article number: 253 (2020)
Abstract
Background
Go/nogo decisions after phase II and sample size chosen for phase III are usually based on phase II results (e.g., the treatment effect estimate of phase II). Due to the decision rule (only promising phase II results lead to phase III), treatment effect estimates from phase II that initiate a phase III trial commonly overestimate the true treatment effect. Underpowered phase III trials are the consequence. Optimistic findings may then not be reproduced, leading to the failure of potentially expensive drug development programs. For some disease areas these failure rates are described to be quite high: 62.5%.
Methods
We integrate the ideas of multiplicative and additive adjustment of treatment effect estimates after go decisions in a utilitybased framework for optimizing drug development programs. The design of a phase II/III program, i.e., the “right amount of adjustment”, the allocation of the resources to phase II and III in terms of sample size, and the rule applied to decide whether to stop or to proceed with phase III influences its success considerably. Given specific drug development program characteristics (e.g., fixed and variable per patient costs for phase II and III, probable gain in case of market launch), optimal designs with respect to the maximal expected utility can be identified by the proposed Bayesianfrequentist approach. The method will be illustrated by application to practical examples characteristic for oncological studies.
Results
In general, our results show that the program setups with adjusted treatment effect estimate used for phase III planning are superior to the “naïve” program setups with respect to the maximal expected utility. Therefore, we recommend considering an adjusted phase II treatment effect estimate for the phase III sample size calculation. However, there is no onefitsall design.
Conclusion
Individual drug development planning for a specific program is necessary to find the optimal design. The optimal choice of the design parameters for a specific drug development program at hand can be found by our user friendly R Shiny application and package (both assessable opensource via [1]).
Background
Exploratory studies are usually carried out to provide a basis for deciding whether or not to proceed with a confirmatory trial and, if necessary, to provide information for planning purposes. In drug development programs, this strong link between exploratory (e.g., phase II) and confirmatory (e.g., phase III) studies favors integrated planning. In particular, the costs of phase III studies have increased remarkably in recent years [2, 3], while failure rates are quite high (approx. 45%, see [4] and the reference mentioned therein). Therefore, the availability and application of quantitative methods for decision making, which should be datadriven and objective, is desirable [5].
Already over 30 years ago, Hughes and Pocock [6] pointed out that decision rules in clinical trials can lead to a bias in the point estimate of the treatment effect, so that the true underlying effect might be overestimated at the time of an early positive decision. Twenty four years and various attempts of authors to adjust for overestimation of the treatment effect (in group sequential designs) later (e.g., [7] and references mentioned therein), Zhang et al. [8] still criticize that the cause and effect of this phenomenon is generally not wellunderstood. Trying to illustrate the problem, they provide a graphical explanation for the occurrence of overestimation. They argue that random variability (i.e., random highs and lows) of the treatment effect estimate is always present, but stabilizes around the true treatment effect as the trial continues to its end. However, when implementing a decision rule the variability favors the random highs: in a phase II/III drug development program with a go/nogo decision rule, it is only proceeded with phase III when large treatment effects are observed, but stopped when small effects occur. This selective handling of random variability may lead to overestimation of the magnitude of the treatment effect after phase II.
Ellenberg et al. [9] as well as Nardini [10] emphasize that the aim of treatment effect estimation is not to decide whether or not one therapy is better than the other, but to describe the size of therapeutic effects. Thus, we are concerned with a problem of estimation, not a problem of testing. Nardini concludes that estimates arising after a decision rule “should [consequently] not be taken at face value as true estimates of the new treatment’s effect”. Ellenberg et al. point out that statistical methods to adjust for this “randomhigh bias” exist, but criticize that “they are not applied as often as they should be”. Recently, the U.S. Food & Drug Administration reported 22 case studies since 1999 in which promising phase II clinical trial results were not confirmed in phase III clinical testing [11]. Such experiences are not rare: for some disease areas, the failure rate for phase III trials is reported to be as high as 62.5% [12] and about 50% for approval [13]. ChuangStein and Kirby [14] give cause for serious concern, as the severity of this may multiply, considering that the bigger the estimated effect from, e.g., a proof of concept trial, the greater the temptation to invest heavily and conduct multiple studies in parallel. They advise to use the concept of “assurance” for quantification of success probabilities and, moreover, to apply an adjustment for the overestimation of the treatment effect (e.g., [15]) when planning the next phase of a drug development program.
In our framework, we follow the concept of “assurance” [16, 17], which had first been introduced by Spiegelhalter et al. in 1986 with the concept of Bayesian predictive power (compare also “average power”) [18, 19]. This methodology was used later in various contexts by O’Hagan et al. [16, 17] (“assurance”), ChuangStein [20], ChuangStein and Yang [21] (“average success probability”) and finally by Gasparini et al. and SaintHilary et al. (“predictive probability of success”) [22, 23]. The idea is to use a prior distribution for the true assumed treatment effect for trial planning. This is in contrast to the “frequentist world”, where a fixed value is assumed. The “assurance” is then the weighted (unconditional) probability of a successful trial for a given effect, the weighting resulting from the likelihood that the therapy will achieve this effect. Due to synthesizing Bayesian principles in the planning phase and frequentistic decisionmaking procedures in the analysis, the abovementioned approaches are described in the literature as “mixed Bayesianfrequentist”.
Kirby et al. [15] and Wang et al. [24] attempt to reduce the impact of overestimation by discounting the phase II treatment effect estimate by applying a multiplicative or additive adjustment, respectively. However, their suggestions are not universally applicable, and are rather “rules of thumb”, e.g., Kirby et al. suggest to use a retention factor of 0.9 times the assumed ratio of the phase III effect to phase II effect.
De Martini [4, 25] reports that the phase II sample size should be almost as large as the ideal phase III sample size (at least 2/3 of the latter) in order to have a sufficiently good information basis for phase III planning. He criticizes that in practice this ratio is only 1/4 on average and that an increase in sponsorship gains from drug development through larger phase II studies has not yet been well investigated. Larger phase II sample sizes would reduce the level of overestimation but increase the estimated phase III sample size [26] and could retrospectively be regarded as an unnecessary high investment in case of a nogo decision. Therefore, an optimal balance is required.
In this article, we integrate the general concepts of using a multiplicative or additive adjustment method to correct for overestimation of the treatment effect in a framework of utilitybased optimization of phase II/III development programs [27]. That is, we want to critically examine adjustment methods from an economic point of view. In addition to simultaneously optimizing the phase II go/nogo decision rule and the sample size, we also optimize over the adjustment parameter used for the phase II treatment effect estimation to find “the right level of adjustment” for the specific situation at hand. Our approach can build the bridge between the long existing gap of theory and practice: we provide a Bayesianfrequentist hybrid framework, in which methods proposed for addressing the problem of overestimation of the treatment effect after go decisions are included in the optimization of drug development programs.
In the second section of this paper, we will introduce the basic setting and notation, explain the adjustment methods and show how they are incorporated in our optimization framework. After introducing the utility function and explaining the optimization procedure, we present optimal designs for exemplary settings of drug development programs in Section 3. We finish with a discussion in Section 4 and a conclusion in Section 5.
Methods
Basic setting
The considered drug development program consists of one exploratory phase II and one confirmatory phase III trial. Both are randomized trials with two arms (each with 1:1 sample size allocation), performed independently, investigating the same timetoevent primary endpoint and the same population. The true treatment effect is given by the negative logarithm of the true hazard ratio (θ = − log(HR)), which is the ratio of the hazard functions of the treatment and the control group. In order to reflect the uncertainty in the true treatment effect, θ can be modelled by a prior distribution f(θ). In phase II, the total number of events is denoted by d_{2} and the maximum likelihood estimate of θ is given by \( {\hat{\theta}}_2 \). We assume that the estimator \( {\hat{\theta}}_2 \) is asymptotically normally distributed with \( {\hat{\theta}}_2\mid \theta \sim N\left(\theta, 4/{d}_2\right) \) (Note that the notation used will not differentiate between the treatment effect estimator (i.e., rule applied to estimate the quantity of interest, which is a random variable) and the treatment effect estimate (i.e., particular realization, fixed value), but by context it will be clear which quantity is meant.). Furthermore, we require that only phase II trials with promising results lead to a phase III trial. This is quantified by a go/nogo criterion with a godecision in case of \( {\hat{\theta}}_2\ge \kappa \), where κ is a predefined threshold value. In case of a go decision, the number of events for the phase III trial is calculated based on the observed treatment effect of phase II. If the confirmatory analysis in phase III reveals a significant result, program success is declared (compare Fig. 1).
Due to the decision rule after phase II, the treatment effect estimate of phase II is biased. The bias is positive with κ > 0 as probability mass is shifted towards higher values:
where here and in the following 1_{A} denotes the indicator function of event A and the density of the distribution of the respective argument is indicated by f(.). The inequation holds as \( \frac{1}{P\left({\hat{\theta}}_2\ge \kappa \theta \right)}>1 \) and \( {\int}_{\infty}^{\infty }{1}_{\left\{{\hat{\theta}}_2\ge \kappa \right\}}\frac{f\left({\hat{\theta}}_2\theta \right)}{P\left({\hat{\theta}}_2\ge \kappa \theta \right)}d{\hat{\theta}}_2=1 \) and, therefore, the probability mass assigned to values less than κ in the unconditional expectation \( E\left[{\hat{\theta}}_2\right] \) is distributed between values greater than κ in \( E\left[{\hat{\theta}}_2{\hat{\theta}}_2\ge \kappa \right] \).
Note that the representation of the bias cannot be further simplified, neither by calculating \( \mathrm{E}\left[{\hat{\theta}}_2\right]\mathrm{E}\left[{\hat{\theta}}_2{\hat{\theta}}_2\ge \kappa \right] \) nor \( \mathrm{E}\left[{\hat{\theta}}_2\right]/\mathrm{E}\left[{\hat{\theta}}_2{\hat{\theta}}_2\ge \kappa \right] \).
Therefore, in the following, multiplicative and additive adjustment methods for the treatment effect estimate obtained in phase II will be investigated. Afterwards, dependent on the respective adjustment method, launch criteria and approaches to calculate the number of events for phase III will be presented.
Additive and multiplicative adjustment methods
In this section, we introduce two methods (an additive and a multiplicative adjustment method) to adjust for the overestimation of the phase II treatment effect estimate. It should be mentioned that the terms “multiplicative” and “additive” relate to the specific type of scale and endpoint considered here.
Wang et al. [24] advise to apply an additive adjustment to the phase II treatment effect estimate if it is used for planning the sample size of phase III. They discuss using the lower limit of the one and two standard deviation confidence interval (CI) from the phase II trial (i.e., the lower limit of the CI for \( {\hat{\theta}}_2 \), corresponding to one or two standard deviations below the point estimate), respectively. We denote the significance level of the lower bound for the onesided CI related to the phase II treatment effect estimate as α_{CI} ∈ [0.025, 0.5] and define the additive adjusted treatment effect estimate by \( {\hat{\theta}}_2^{a_{CI}}={\hat{\theta}}_2{z}_{1{a}_{CI}}\cdot \sqrt{4/{d}_2} \), with z_{1 − γ} = Φ^{−1}(1 − γ), where Φ(.) denotes the distribution function of the standard normal distribution. Note that our version of the additive adjusted treatment effect estimate is a generalization of that of Wang et al., as they use the lower limit of the one and two standard deviation twosided CI (i.e., in our notation α_{CI} = 0.32/2 and α_{CI} = 0.05/2) and we allow α_{CI} ranging from 0.025 to 0.5. For α_{CI} = 0.5, the additive adjusted treatment effect estimate is not discounted as \( {\hat{\theta}}_2{z}_{10.5}\cdot \sqrt{4/{d}_2}={\hat{\theta}}_2 \).
Kirby et al. [15] propose a multiplicative adjustment approach. They multiply the observed treatment effect estimate with a factor λ, which can be understood as a retention factor, that is, the fraction of the treatment effect retained. Integrated in our setting, we define \( {\hat{\theta}}_2^{\lambda }=\lambda \cdot {\hat{\theta}}_2 \), where the multiplicative adjustment parameter λ ∈ [0.2, 1] can be viewed as the result of discounting the observed treatment effect of phase II by 1 − λ. Note that for λ = 1 the multiplicative adjusted treatment effect estimate is not discounted.
Go/nogo criteria, calculation of expected number of events for phase III and related program characteristics
When designing the phase II/III program, the observed treatment effect estimate of phase II plays a key role in two ways: 1. when making the go/nogo decision (selection s_{1}); 2. when calculating the phase III sample size (selection s_{2}; compare Fig. 1). At both instances, one has to decide whether or not to use an adjusted or unadjusted treatment effect estimate. To ease notation, the naïve (unadjusted) treatment effect estimate of phase II is denoted by \( {\hat{\theta}}_2^u={\hat{\theta}}_2 \).
1.: If the treatment effect estimate \( {\hat{\theta}}_2^{s_1} \), where s_{1} = λ, α_{CI} or u (i.e., the multiplicatively adjusted, additively adjusted or unadjusted treatment effect estimate is selected for the decision rule), exceeds a predefined threshold value κ, it is decided to go to phase III and otherwise to stop the program. Hence, the expected probability to go to phase III can be determined by
s_{1} = λ, α_{CI} or u (compare Table A0 in the Additional file 1).
2.: In case of a go decision, the number of events for phase III is calculated based on the treatment effect estimate of phase II \( {\hat{\theta}}_2^{s_2} \), s_{2} = λ, α_{CI} or u, a desired power 1 − β, and a onesided significance level α. For a balanced allocation ratio, it can be calculated by
by assuming proportional hazards and asymptotic properties of the logrank test statistic [28]. When going to phase III, the expected number of events (in phase II/III programs with decision rule \( {\hat{\theta}}_2^{s_1}\ge \kappa \) and \( {\hat{\theta}}_2^{s_2} \) used to calculate the number of events for phase III) can be determined by
(compare Table A0). The expectation of the estimate (of phase II) used for the sample size calculation can be calculated by
for s_{1}, s_{2} = λ, α_{CI} or u (compare Table A0) in order to calculate the bias \( \mathrm{E}\left[{\hat{\theta}}_2^{s_2}{\hat{\theta}}_2^{s_1}\ge \kappa \right]\mathrm{E}\left[{\hat{\theta}}_2\right] \). As proposed by De Martini [4, 25], the ratio of the number of events in phase II and III will also be calculated.
The program is considered to be successful, if the onesided null hypothesis H_{0} : θ ≤ 0 is rejected in favour of H_{1} : θ > 0 at a onesided significance level α. This is the case if T_{3} > z_{1 − α}, where T_{3} is the normalized logrank test statistic in phase III, which is assumed to be asymptotically normally distributed, i.e., \( {T}_3={T}_3\mid {\hat{\theta}}_2,\theta \sim N\left(\theta /\sqrt{4/{D}_3},1\right) \). Note that significance testing is performed on phase III data only. Therefore, the expected probability of a successful program \( PsP\left({\hat{\theta}}_2^{s_1},{\hat{\theta}}_2^{s_2}\right) \) (with decision rule \( {\hat{\theta}}_2^{s_1}\ge \kappa \), and \( {\hat{\theta}}_2^{s_2} \) used to calculate the number of events for phase III), which is defined as the expected probability of the joint event of going to phase III and achieving a significant result [25, 27], can be calculated by
where t_{3} is a realization of \( {T}_3\mid {\hat{\theta}}_2,\theta \) (compare Table A0). One reviewer pointed out that this definition of a successful program records a false positive result (i.e. T_{3} > z_{1 − α} under H_{0}) as program success. We discuss this aspect in detail in Section A1 of Additional file 1. In reality, regulatory approval and with that a monetary gain, which is the core driver for our utility function, is achieved when a significant result is observed in phase III, acknowledging that there is a probability of α that it is a false positive decision. Thus, we keep the commonly used term “success” and PsP which should be regarded as probability of market access and not a probability of a correct decision.
Considered program setups
We investigate the impact of using adjusted treatment effect estimates (i.e., \( {\hat{\theta}}_2^{\lambda } \) or \( {\hat{\theta}}_2^{\alpha_{CI}} \)) for the go/nogo decision and/or for the calculation of the number of events for phase III on the drug development program characteristics and compare the results to those where the unadjusted (naïve) treatment effect estimate \( {\hat{\theta}}_2^u \) was used. Therefore, we investigate different program setups \( S\left({\hat{\theta}}_2^{s_1},{\hat{\theta}}_2^{s_2}\right) \) which are defined by the selection of the treatment effect estimate used for the decision rule (selection s_{1}) and, in case of a go decision, by the choice of the treatment effect estimate used for the calculation of the number of events for phase III (selection s_{2}).
Table 1 gives an overview of the considered program setups. We compare the “unadjusted” setup \( \left({\hat{\theta}}_2^u,{\hat{\theta}}_2^u\right) \), where \( {\hat{\theta}}_2^u={\hat{\theta}}_2 \) (i.e., s_{1}, s_{2} = u), with two “multiplicatively adjusted” setups \( S\left({\hat{\theta}}_2^{s_1},{\hat{\theta}}_2^{\lambda}\right) \) (s_{1} ∈ {u, λ}, s_{2} = λ), and two “additively adjusted” setups \( S\left({\hat{\theta}}_2^{s_1},{\hat{\theta}}_2^{\alpha_{CI}}\right) \) (s_{1} ∈ {u, α_{CI}}, s_{2} = α_{CI}). Note that if s_{1} ≠ u, we define s_{2} = s_{1}, which means that if an adjustment parameter is used for the decision rule, the same adjustment parameter is used for the calculation of the expected number of events for phase III (for reasons which will be given later).
Utility function
The aim is to optimize a phase II/III drug development program in terms of the adjustment parameters λ or α_{CI}, the number of events in phase II d_{2}, and the go/nogo decision threshold value κ. Therefore, we set up a utility function, which utilizes the difference between program costs and potential gains after successful market launch (compare Fig. 2 for a graphical illustration). For the costs, fixed (c_{02}, c_{03}) and variable perpatient (c_{2}, c_{3}) costs are included for the phase II and III trial, respectively. By dividing the number of events by the event rate ξ_{i}, the total number of patients can be calculated for the respective phase i = 2, 3. Obviously, only in case of a go decision the costs of the phase III trial apply. In case of program success, a benefit b is obtained, and we assume that the level of benefit depends on the observed treatment effect in the phase III trial as suggested by a report of the German Institute for Quality and Efficiency in Health Care [29]. As proposed by them, three effect size categories (small, medium and large) are used, whereby each category is defined by a threshold value (1, 0.95, 0.85) for the upper boundary of the 95% confidence interval for the HR (for details on the derivation of these threshold values, the interested reader may be referred to the “Anhang A”of [29]). The corresponding amount of benefit is denoted by b_{1}, b_{2} and b_{3}, respectively. Based on this, costs c(d_{2}, κ, s_{2}) and gain g(d_{2}, κ, s_{2}) for a phase II/III program with program setup \( S\left({\hat{\theta}}_2^{s_1},{\hat{\theta}}_2^{s_2}\right) \) are given by
where \( {I}_1=\left({z}_{1\alpha },\log (0.95)/\sqrt{4/{D}_3}+{z}_{1\alpha}\right] \), \( {I}_2=\left(\log (0.95)/\sqrt{4/{D}_3}+{z}_{1\alpha },\log (0.85)/\sqrt{4/{D}_3}+{z}_{1\alpha}\right] \) and \( {I}_3=\left(\log (0.85)/\sqrt{4/{D}_3}+{z}_{1\alpha },\infty \right) \) are transformations of the effect size intervals to intervals on the test statistic scale of T_{3}. Thus, the costs depend on the observed treatment effect in phase II and the gain depends on the observed treatment effect in phase II and III.
The utility is defined as the difference between costs and gain and expressed as a function of d_{2} and κ over which it is simultaneously optimized. In the adjusted program setups, the optimization is also over λ in the multiplicatively, and over α_{CI} in the additively adjusted setups, respectively. Thus, we define the utility for program setup \( S\left({\hat{\theta}}_2^{s_1},{\hat{\theta}}_2^{s_2}\right) \) by
where for the unadjusted program setup \( S\left({\hat{\theta}}_2^u,{\hat{\theta}}_2^u\right) \) u(d_{2}, κ, s_{2}) = u(d_{2}, κ). To incorporate the development risk in terms of success probabilities, we consider the expected utility with respect to θ, \( {\hat{\theta}}_2 \) and T_{3} E[u(d_{2}, κ, s_{2})] = E[g(d_{2}, κ, s_{2})] − E[c(d_{2}, κ, s_{2})], where the expected costs and gain with respect to θ, \( {\hat{\theta}}_2 \) and T_{3} are given by
The aim is to find a design δ = (d_{2}, κ, s_{2}) that maximizes the expected utility E[u(d_{2}, κ, s_{2})] for programs with program setup \( S\left({\hat{\theta}}_2^{s_1},{\hat{\theta}}_2^{s_2}\right). \) The optimization is carried out over d_{2}, κ, and λ in the multiplicatively or α_{CI} in the additively adjusted setups, respectively. The optimal design δ^{∗} for each program setup \( S\left({\hat{\theta}}_2^{s_1},{\hat{\theta}}_2^{s_2}\right) \) is defined to be the design for which the expected utility is maximized, that is, \( \mathrm{E}\left[u\left({\delta}^{\ast}\right)\right]=\underset{\delta \in D}{\max}\mathrm{E}\left[u\left(\delta \right)\right] \), where D = {δ = (d_{2}, κ, s_{2})} is the optimization set.
The optimization is solved by using numerical integration procedures written in the programming language R [30]. In order to facilitate the application of the approach, an user friendly R Shiny App (bias) and an R package (drugdevelopR including the R function optimal_bias) are provided opensource (both assessable via [1]).
Illustration of the framework by application to oncology trial example and practical extensions
In this paper, the parameters in the oncology trial example are chosen as in Kirchner et al. [27] to allow comparison of results. It should be noted that the example is primarily given to illustrate the framework and the chosen parameters should not be taken as face values. We tried to elicit the design parameters as realistic as possible to mimic an oncology drug development program by means of information from relevant literature and consultation with experts from the pharmaceutical industry in the field of oncology. However, it should be noted, that these parameters must be chosen carefully and specifically for each drug development scenario at hand.
The event rates for phase II and III are set to ξ_{i} = 0.7 for i = 2, 3. Therefore, the total sample size can be calculated by d_{i}/0.7, i = 2, 3. In practice, estimates on the event rates could be obtained by taking recruitment rates and duration as well as dropout rates and treatment group specific hazards into account. However, using those parameters often leads to event rates around ξ_{i} = 0.7 as it is a compromise between data maturity and avoidance of long followup times if dropout rates are higher than expected. If ξ_{i} < 0.5 the median event time might not be observed while if ξ_{i} is too high, the planned number of events might not be reached at all with substantial dropout rates.
For phase III oncology trials, perpatient costs between 75,000 and 125,000 US $ are reported [31]. Therefore, perpatient costs for phase III of $10^{5} are considered and c_{3} is set to 1 (in $10^{5}). Furthermore, the perpatient costs for phase II are set to c_{2} = 0.75 (in $10^{5}). Due to, for example, additional biomarker measurements made in phase III, or because regulatory agencies may require more extensive data collection in phase III [32], higher perpatient costs in phase III compared with phase II are reasonable. In this example, the fixed costs for phase II and III are set to c_{02} = 100 and c_{03} = 150 (in $10^{5}), respectively. To investigate different scenarios, the benefit parameters b_{1}, b_{2} and b_{3} are chosen to embody a low (b_{1}, b_{2}, b_{3}) 1 : (1000, 2000, 3000), 2 : (1000, 2000, 4000), 3 : (1000, 3000, 4000) and a medium to large (b_{1}, b_{2}, b_{3}) = 4 : (1000, 3000, 5000), 5 : (1000, 4000, 5000), 6 : (1000, 3000, 6000), 7 : (1000, 4000, 6000) overall benefit (in $10^{5}), where we assume a 5year income period and profit margin of 0.2. Thus, seven different benefit scenarios (bs 1–7) will be considered. A mixture distribution consisting of the weighted sum of two normal distributions
as proposed by Götte et al. [26] can be used to model the true treatment effect. The two normal distributions each depict a distribution for θ, whereby the means represent values of the assumed true treatment effect and the denominators of the associated variances can be viewed as “amount of certainty” about the treatment effect size in terms of numbers of events. The parameters of the distributions (i.e., means and variances) are elected such that a realistic range for the HR is covered (compare Fig. A2 in Additional file 1 and/or investigate the prior distribution with the help of our R shiny App prior [33]). The mean of the first of the two normal distributions characterizes a strong, the second one a moderate to low treatment effect, so that by ranging w from, e.g., 0.3 to 0.9 we can mirror pessimistic to more optimistic opinions about the true treatment effect. In practice, the choice of w can be guided by formal expert elicitation methods. Dallow et al. [34] presented an overview of such methods including elicitation of Gaussian mixture distributions. Note that the approach is general and allows for implementation of any alternative prior distribution. Again, elicitation methods (compare also, e.g., [35]) are a useful tool that may help (a group of) experts to quantify their opinions about the treatment effect as a probability distribution. Various software packages enable their practical application (compare, e.g., [36]).
In our framework it is also possible to account for, e.g., different population structures in phase II and phase III (due to different countries, centers, in/exclusion criteria, …) by assuming different distributions for the assumed true treatment effect in phase II and III (i.e., θ_{2} ≁ θ_{3}), so that \( {\hat{\theta}}_2\mid {\theta}_2\sim N\left({\theta}_2,4/{d}_2\right) \) and \( {T}_3\mid {\hat{\theta}}_2,{\theta}_2,{\theta}_3\sim N\left({\theta}_3/\sqrt{4/{D}_3},1\right) \). For ease of interpretation, all formulas and results presented in the main part are for the special case, where the true treatment effect is modelled by the same distribution for phase II and III (e.g., θ~θ_{2}~θ_{3}), and a brief investigation of this aspect can be found in Section A2 of Additional file 1.
In this example, we chose a wide range for κ (and d_{2}, as well as λ or α_{CI}, respectively) such that the optimization is not influenced by that choice. Therefore, the optimization set is D ={δ = (d_{2}, κ, s_{2}), d_{2} ∈ {50, 52, …, 350}, κ ∈ {− log(0.9), − log(0.89), …, − log(0.7)}, s_{2} = λ ∈ {0.2, 0.225, …, 1} or s_{2} = α_{CI} ∈ {0.025, 0.075, …, 0.5}}. However, the lower bound of the decision rule set for κ can also be seen as representing a predefined clinically relevant effect size: phase III trials are then only conducted if the treatment effect observed in phase II is at least of that size. In Section A3 of Additional file 1, we present results of the procedure, where we chose min(κ) = − log(0.8). Furthermore, it might be interesting to see how the optimal program design is influenced by the sponsor’s real life budget constraint. Therefore, we also consider optimizing the drug development program with a constraint K on the expected costs of the program, i.e., E[c(d_{2}, κ, s_{2})] ≤ K (see Section A4 of Additional file 1 for more details). In pharmaceutical industry there are often discussions about skipping the phase II trial. For example, if competitors have already approved a drug with a similar mode of action one might see no need for further learning about the drug and go directly to a confirmatory trial. Our framework allows to systematically assess this aspect by setting d_{2} = 0, c_{02} = c_{2} = 0 and p_{go} = 1 (see Section A5 of Additional file 1 for more details). In addition, different definitions of the cost and benefit functions are possible. As mentioned above, the choice of three effect size categories (and therefore the benefit function) is based on a report of the German Institute for Quality and Efficiency in Health Care [29]. However, the presented framework could also be applied to an alternative setup as, for example, the one proposed by Ding et al. [32]. Here, a proportional relationship between benefit and effect size is considered. In Section A6 of Additional file 1 we investigate this possibility in more detail.
Results
This section is organized as follows. It starts with general observations across all program setups \( S\left({\hat{\theta}}_2^{s_1},{\hat{\theta}}_2^{s_2}\right) \). Then, we compare multiplicative \( S\left({\hat{\theta}}_2^{s_1},{\hat{\theta}}_2^{\lambda}\right) \) vs. additive \( S\left({\hat{\theta}}_2^{s_1},{\hat{\theta}}_2^{\alpha_{CI}}\right) \) vs. no adjustment \( S\left({\hat{\theta}}_2^u,{\hat{\theta}}_2^u\right) \), where s_{1} = u or s_{1} = s_{2}. The impact of adjusting the go/no go decision making, i.e., the differences between both multiplicative (\( S\left({\hat{\theta}}_2^u,{\hat{\theta}}_2^{\lambda}\right) \) vs. \( S\left({\hat{\theta}}_2^{\lambda },{\hat{\theta}}_2^{\lambda}\right) \)) and both additive adjustment methods (\( S\left({\hat{\theta}}_2^u,{\hat{\theta}}_2^{\alpha_{CI}}\right) \) vs. \( S\left({\hat{\theta}}_2^{\alpha_{CI}},{\hat{\theta}}_2^{\alpha_{CI}}\right) \)) are also presented. A discussion of the results is given in the next section.
The optimization results are presented in Table 2 (naïve setting, multiplicative adjustment), Table 3 (additive adjustment) and Figure 3, which show the optimal design parameters \( {\delta}^{\ast }=\left({d}_2^{\ast },{\kappa}^{\ast },{s}_2^{\ast}\right) \):

optimal total number of events for phase II \( {d}_2^{\ast } \) (given by the optimal value of d_{2} ∈ D),

optimal go/nogo decision rule threshold value \( {HR}_{go}^{\ast } \) (given by the optimal value of κ ∈ D in “HRscale”, i.e., \( {HR}_{go}^{\ast }=\exp \left({\kappa}^{\ast}\right) \)) and

optimal adjustment parameter \( {s}_2^{\ast}\in \left\{{\lambda}^{\ast },{a}_{CI}^{\ast}\right\} \) (given by the optimal value of s_{2} ∈ D) for the multiplicative and additive adjustment method, respectively,
with corresponding program characteristics for the optimal design:

maximal expected utility u^{∗} = E[u(δ^{∗})],

expected number of events for phase III \( {d}_3^{\ast }={d}_3\left({\hat{\theta}}_2^{s_1},{\hat{\theta}}_2^{s_2^{\ast }}\right) \), where we chose a desired power of 1 − β = 0.9 and a onesided significance level α = 0.025,

total number of expected events in the program \( {d}^{\ast }={d}_3^{\ast }+{d}_2^{\ast } \),

expected probability to go to phase III \( {p}_{go}^{\ast }={p}_{go}\left({\hat{\theta}}_2^{s_1}\right) \),

expected probability of a successful program \( {sP}^{\ast }= PsP\left({\hat{\theta}}_2^{s_1},{\hat{\theta}}_2^{s_2^{\ast }}\right) \) and

expected estimate of phase II used for sample size calculation \( {\varepsilon}_2^{\ast }=\exp \left({e}_2\left({\hat{\theta}}_2^{s_1},{\hat{\theta}}_2^{s_2^{\ast }}\right)\right) \) in “HRscale”,
for program setups \( S\left({\hat{\theta}}_2^{s_1},{\hat{\theta}}_2^{s_2}\right) \), where s_{1} = u or \( {s}_1={s}_2^{\ast}\in \left\{{\lambda}^{\ast },{a}_{CI}^{\ast}\right\} \), benefit scenarios (bs 17) and weights for the prior distribution of the true underlying effect (w = 0.3, 0.6, 0.9), where \( E\left[{\hat{\theta}}_2\right]={\int}_{\infty}^{\infty }{\int}_{\infty}^{\infty }{\hat{\theta}}_2\cdot f\left({\hat{\theta}}_2\theta \right)\cdot f\left(\theta \right)d{\hat{\theta}}_2 d\theta \).
Overall, larger assumed benefits (i.e., larger values for (b_{1}, b_{2}, b_{3})) lead to more liberal optimal decision rules (i.e., larger values for \( {HR}_{go}^{\ast } \)) and higher investment in phase II (i.e., larger number of events for phase II \( {d}_2^{\ast } \)). This leads to a larger investment (in phase III), i.e., a higher expected probability to go to phase III \( {p}_{go}^{\ast } \) and a larger expected number of events in phase III \( {d}_3^{\ast } \), respectively. This results in a larger expected probability of a successful program sP^{∗} and thus in a larger maximal expected utility u^{∗}.
In the multiplicatively adjusted program setups \( S\left({\hat{\theta}}_2^{s_1},{\hat{\theta}}_2^{\lambda}\right) \), the maximal expected utility is always higher than the maximal expected utility in the additively adjusted program setups \( S\left({\hat{\theta}}_2^{s_1},{\hat{\theta}}_2^{\alpha_{CI}}\right) \), which in turn is always higher than the maximal expected utility in the unadjusted program setup \( S\left({\hat{\theta}}_2^u,{\hat{\theta}}_2^u\right) \). It stands out that the investment in terms of numbers of events (i.e., \( {d}_2^{\ast },{d}_3^{\ast },{d}^{\ast } \)) tends to be higher in the adjusted program setups compared to the unadjusted program setup, especially for scenarios with higher benefits and more optimistic prior. The expected probability to go to phase III \( {p}_{go}^{\ast } \) is notably lower in the adjusted program setups compared to the unadjusted program setup, whereas the expected probability of a successful program sP^{∗} is higher.
Dividing the optimal number of events in phase II by the expected number of events in phase III (i.e., \( {d}_2^{\ast } \) / \( {d}_3^{\ast } \)), leads to values of 0.55–0.64, 0.55–0.64, 0.58–0.67, 0.43–0.54 and 0.42–0.54 in program setup \( S\left({\hat{\theta}}_2^u,{\hat{\theta}}_2^u\right) \), \( S\left({\hat{\theta}}_2^u,{\hat{\theta}}_2^{\alpha_{CI}}\right) \), \( S\left({\hat{\theta}}_2^{\alpha_{CI}},{\hat{\theta}}_2^{\alpha_{CI}}\right) \), \( S\left({\hat{\theta}}_2^u,{\hat{\theta}}_2^{\lambda}\right) \) and \( S\left({\hat{\theta}}_2^{\lambda },{\hat{\theta}}_2^{\lambda}\right) \), respectively. Furthermore, it can be observed that the treatment effect estimate of phase II used for sample size calculation in the optimal design is overestimated in the unadjusted setting (\( {\varepsilon}_2^{\ast }<\exp \left(\mathrm{E}\left[{\hat{\theta}}_2\right]\right) \) as indicated by the black circles and yellow line in Figure 3). This overestimation is lower in the adjusted settings and can even result in an underestimation (compare multiplicative settings for w = 0.9).
The operating characteristics for the optimal designs (e.g., u^{∗}, sP^{∗}) compared between the two multiplicatively and the two additively adjusted program setups do not vary (much) for each benefit scenario bs and choice of weight for the prior distribution w, respectively. However, there are differences in the optimal choice of the threshold value for the decision rule \( {HR}_{go}^{\ast } \): in the program setups with adjusted phase II treatment effect estimate used for decision making (\( S\left({\hat{\theta}}_2^{\lambda },{\hat{\theta}}_2^{\lambda}\right) \) and \( S\left({\hat{\theta}}_2^{\alpha_{CI}},{\hat{\theta}}_2^{\alpha_{CI}}\right) \)), \( {HR}_{go}^{\ast } \) is always larger (by 0.04 to 0.06 and by 0.01 to 0.07, respectively) than in the program setups with unadjusted treatment effect used for decision making (\( S\left({\hat{\theta}}_2^u,{\hat{\theta}}_2^{\lambda}\right) \) and \( S\left({\hat{\theta}}_2^u,{\hat{\theta}}_2^{\alpha_{CI}}\right) \)).
Discussion
To find optimal drug development designs, the costs of the program (fixed/variable costs for phase II/III), the assumed benefit, and the development risk (i.e., the expected probability of a successful program) are taken into account. By maximizing the expected utility with respect to the design parameters (adjustment parameter, number of events for phase II and threshold value for the go/nogo decision rule), optimal phase II/III drug development program designs can be found. Therefore, it enables quantitative reasoning for the design (i.e., the optimal “amount of adjustment”, sample size and decision rule) for specific drug development programs at hand.
We investigated two adjustment methods (additive and multiplicative adjustment), several benefit scenarios (e.g., low, medium, large overall benefit), different distributions for the true treatment effect (with the same and different distributions in phase II and III), scenarios with a real life budget constraint, scenarios with a predefined clinically relevant effect, and scenarios where phase II could be skipped, hence presented a method for the implementation of a variety of possible oncology drug development program scenarios, and an opportunity for assessing associated changes of the optimal design parameters. Of course, the implementation of alternative (e.g., proportional relationship between benefit and effect size) or more complex planning situations and broader application to other research areas are possible by choosing relevant (e.g., cost and benefit) parameters appropriately [37,38,39]. As the framework has been shown to be very flexible, frequent scenarios in oncology drug development are adequately mapped with our approach. However, certain situations may be simplified. For example, in our framework the development program consists entirely of just one phase II trial and one phase III trial, which is, however, not unusual in oncology. For situations that two or more phase III trials are performed, the framework of optimal planning of development programs was presented in a recent article by Preussler et al. [40]. Furthermore, we assumed the phase II trial to be twoarmed. In the field of oncology dose investigations are often performed before and not as a part of phase II. However, in other indications dosefinding is performed in phase II. Methods for optimizing phase II/III programs with multiarmed phase II/III studies are presented in Preussler et al. [41]. Futility investigations in the phase III trial and/or considering a “seamless design” for the final analysis may be a worthwhile option, and it will be a topic of future research to investigate their impact on the optimal design. We assumed that the endpoint used in phase II and phase III is the same. We are currently exploring the situation that a surrogate (like progressionfree or diseasefree survival) is captured in phase II and overall survival is the primary endpoint in phase III. Another important point is that timeeffects are not considered in this article. The program is unaccounted for the duration of development which is amongst others discussed in Preussler et al. [41]. That work presents in detail how to incorporate the impact of trial duration into the framework (compare Supplementary Material A2 [41]). However, when trying to incorporate “time” into the utility function, many aspects have to be considered. For example, one could take into account the “life cycle” of a drug as proposed by Patel & Ankolekar [42] who describe a typical life cycle by an early growth phase followed by a plateau, after which the sales decline as the patent expires. Furthermore, if there are several competitors investigating a similar drug then the company, who is the first to bring the drug to the market, usually gets the higher market share, i.e., higher gain. However, including these aspects requires competitor information and assumptions about their unknown future observed treatment effects. Any such assumptions are usually associated with very high uncertainty. Instead of trying to include too many (unknown) aspects into the utility function a rather simplified approach, as presented here, is advisable. If after observing phase II data further information about the potential of the drug, dose, target population or (timedependent) benefits are available the probability of success (compare [43]) and the utility function could be updated to support go/nogo decisions as well as the design of the phase III trial.
In general, our results show that the adjusted program setups are superior to the unadjusted program setup with respect to the maximal expected utility. This is associated with higher investments in terms of number of events and lower expected probabilities to go to phase III in the adjusted program setups compared to the unadjusted approach. Thus, in the adjusted program setups it is less often decided to go to phase III, but in case of a go decision, the investment in terms of sample size is higher. These aspects are particularly true for the multiplicatively adjusted program setups, which have also higher expected probabilities of a successful program compared to the additively adjusted and unadjusted program setups. Simply said, the money is spent more wisely when adjustment methods are used.
Values for the adjustment parameters that do not lead to an adjustment (i.e., α_{CI} = 0.5 and λ = 1 in the additively and multiplicatively adjusted program setups, respectively) were included but never selected in the optimization. Thus, the results suggest that adjustment should always be considered, which is in line with ChuangStein and Kirby [14]. Furthermore, we see that in the unadjusted case there is an overestimation of the treatment effect after phase II, which is mitigated by the adjustments. In the multiplicative setting it is even shown that an overcorrection and thus an even larger investment in terms of sample size can be worthwhile with respect to the expected utility. Note that the focus is on maximal expected utility and the expected estimate of phase II is only a supporting variable, i.e., obtaining a “perfectly” unbiased estimator is not the goal in this application. With regard to the optimal number of events in phase II compared to phase III (\( {d}_2^{\ast } \) / \( {d}_3^{\ast } \)), it can be seen that with the framework in the unadjusted and additive case one ends up in the “desirable” (according to De Martini [4, 25]) range of 2/3 and also in the multiplicative case with lower \( {d}_2^{\ast } \) / \( {d}_3^{\ast } \), one still exceeds the often used 1/4. However, it should be noted that the total optimal sample size is highest for the multiplicative case.
Both multiplicatively adjusted (i.e., \( S\left({\hat{\theta}}_2^{s_1},{\hat{\theta}}_2^{\lambda}\right) \)) and additively adjusted (i.e., \( S\left({\hat{\theta}}_2^{s_1},{\hat{\theta}}_2^{\alpha_{CI}}\right) \)) program setups do not differ in their maximal expected utility, whereas the program setups with adjusted estimate used for decision making (i.e., \( S\left({\hat{\theta}}_2^{\lambda },{\hat{\theta}}_2^{\lambda}\right) \) and \( S\left({\hat{\theta}}_2^{\alpha_{CI}},{\hat{\theta}}_2^{\alpha_{CI}}\right) \)) have larger optimal threshold values for the decision rule than program setups where only the estimate used for calculating the expected number of events for phase III is adjusted (i.e., \( S\left({\hat{\theta}}_2^u,{\hat{\theta}}_2^{\lambda}\right) \) and \( S\left({\hat{\theta}}_2^u,{\hat{\theta}}_2^{\alpha_{CI}}\right) \)). Considering only these two aspects, adjustment of the treatment effect estimate used for the decision rule may be omitted when also optimizing the threshold value for the decision rule: this only leads to larger values for \( {HR}_{go}^{\ast } \) (i.e., more liberal decision rules) which compensate the adjusted (more conservative) treatment effect estimates. For the same reason, program setups \( S\left({\hat{\theta}}_2^{\lambda },{\hat{\theta}}_2^u\right) \) and \( S\left({\hat{\theta}}_2^{\alpha_{CI}},{\hat{\theta}}_2^u\right) \) (i.e., multiplicative or additive adjustment used for the decision rule and no adjustment applied for the calculation of the number of events for phase III) are not considered. Furthermore, as adjustment of the treatment effect estimate used for the decision rule may be omitted when also optimizing over the threshold value for the decision rule, we did not consider program setups where different adjustment parameters used for the decision rule and the calculation of the expected number of events are optimized (in our notation \( S\left({\hat{\theta}}_2^{\lambda_1},{\hat{\theta}}_2^{\lambda_2}\right) \) and \( S\left({\hat{\theta}}_2^{{\alpha_{CI}}_1},{\hat{\theta}}_2^{{\alpha_{CI}}_2}\right) \)).
Conclusions
Based on our results, we highly recommend using (multiplicatively) adjusted phase II treatment effect estimates for calculation of the phase III number of events in a phase II/III drug development program with go/nogo decision rule (compare ChuangStein & Kirby [14], Kirby et al. [15] and De Martini [4, 25]). However, as our results also show that the optimal design parameters of each method depend on the cost and benefit parameters as well as on the applied prior distribution, no general rule exists. In contrast, the design parameters should be determined by applying our proposed optimization procedure for specific values of the parameters in the respective drug development program. Therefore, we provide an user friendly R Shiny App (bias) and an R package (drugdevelopR including the R function optimal_bias) opensource (both assessable via [1]).
Availability of data and materials
The datasets used can be generated with the help of the R package drugdevelopR and the code containing the respective function calls is provided in the additional files (see file Code.R).
Abbreviations
 α _{ CI } ,λ :

Adjustment parameter for additive and multiplicative adjustment method, respectively
 bs :

Benefit scenario
 CI :

Confidence interval
 d _{2} ,d _{3} ,d :

Total number of events for phase II, III and the program, respectively
 HR :

True assumed hazard ratio
 κ :

Threshold value for the go/nogo decision rule, κ = − log (HR_{go})
 s _{ 1 } ,s _{2} :

Estimate used for go/nogo decision and calculation of number of events, respectively
 θ :

True assumed treatment effect, θ = − log (HR)
References
 1.
Erdmann, S. drugdevelopR: bias. https://web.imbi.uniheidelberg.de/bias/. Accessed 02 Jul 2020.
 2.
DiMasi JA, Hansen RW, Grabowski HC, Lasagna L. Research and development costs for new drugs by therapeutic category. Pharmaco Economics. 1995;7:152–69.
 3.
DiMasi JA, Feldman L, Seckler A, Wilson A. Trends in risks associated with new drug development: success rates for investigational drugs. Clinical Pharmacology & Therapeutics. 2010;87:272–7.
 4.
De Martini D. Empowering phase II clinical trials to reduce phase III failures. Pharm Stat. 2020;19:178–86.
 5.
Antonijevic Z. Optimization of Pharmaceutical R&D Programs and portfolios: design and investment strategy. Heidelberg: Springer; 2015.
 6.
Hughes MD, Pocock SJ. Stopping rules and estimation problems in clinical trials. Stat Med. 1988;7:1231–42.
 7.
Fan X, DeMets DL, Lan KG. Conditional bias of point estimates following a group sequential test. J Biopharm Stat. 2004;14:505–30.
 8.
Zhang JJ, Blumenthal G, He K, Tang S, Cortazar P, Sridhara R. Overestimation of the effect size in group sequential trials. Clin Cancer Res. 2012;18:18,4872–6.
 9.
Ellenberg SS, DeMets DL, Fleming TR. Bias and trials stopped early for benefit. Jama. 2010;304:156–9.
 10.
Nardini C. Monitoring in clinical trials: benefit or bias? Theoretical Medicine and Bioethics. 2013;34:259–74.
 11.
US Food and Drug Administration. 22 case studies where phase 2 and phase 3 trials had divergent results. 2017. Available at http://go.nature.com/2mayug4. Accessed 02 Jul 2020.
 12.
Gan HK, You B, Pond GR, Chen EX. Assumptions of expected benefits in randomized phase III trials evaluating systemic treatments for cancer. J Natl Cancer Inst. 2012;104:590–8.
 13.
Arrowsmith J. Trial watch: phase II failures: 2008–2010. Nat Rev Drug Discov. 2011;10:328–9.
 14.
ChuangStein C, Kirby S. The shrinking or disappearing observed treatment effect. Pharm Stat. 2014;13:277–80.
 15.
Kirby S, Burke J, ChuangStein C, Sin C. Discounting phase 2 results when planning phase 3 clinical trials. Pharm Stat. 2012;11:373–85.
 16.
O'Hagan A, Stevens JW, Montmartin J. Bayesian costeffectiveness analysis from clinical trial data. Stat Med. 2001;20:733–753.2005.
 17.
O'Hagan A, Stevens JW, Campbell MJ. Assurance in clinical trial design. Pharm Stat. 2005;4:187–201.
 18.
Spiegelhalter DJ, Freedman LS, Blackburn PR. Monitoring clinical trials: conditional or predictive power? Control Clin Trials. 1986;7:8–17.
 19.
Spiegelhalter DJ, Freedman LS. A predictive approach to selecting the size of a clinical trial, based on subjective clinical opinion. Statistics Med. 1986;5:1–13.
 20.
ChuangStein C. Sample size and the probability of a successful trial. Pharm Stat J Appl Stat Pharm Ind. 2006;5:305–9.
 21.
ChuangStein C, Yang R. A revisit of sample size decisions in confirmatory trials. Statistics in Biopharmaceutical Research. 2010;2:239–48.
 22.
Gasparini M, Di Scala L, Bretz F, RacinePoon A. Some uses of predictive probability of success in clinical drug development. Epidemiology, biostatistics and. Public Health. 2013;10:1.
 23.
SaintHilary G, Barboux V, Pannaux M, Gasparini M, Robert V, Mastrantonio G. Predictive probability of success using surrogate endpoints. Stat Med. 2019;38:1753–74. https://doi.org/10.1002/sim.8060.
 24.
Wang SJ, Hung HM, O'Neill RT. Adapting the sample size planning of a phase III trial based on phase II data. Pharm Stat. 2006;5:85–97.
 25.
De Martini D. Adapting by calibration the sample size of a phase III trial on the basis of phase II data. Pharm Stat. 2011;10:89–95.
 26.
Götte H, Schüler A, Kirchner M, Kieser M. Sample size planning for phase II trials based on success probabilities for phase III. Pharm Stat. 2015;14:515–24.
 27.
Kirchner M, Kieser M, Götte H, Schüler A. Utilitybased optimization of phase II/III programs. Stat Med. 2016;35:305–16.
 28.
Schoenfeld D. The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika. 1981;68:316–9.
 29.
IQWiG. Allgemeine Methoden. Version 5.0, 10.07.2016, Technical Report.
 30.
R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2019. Available at https://www.Rproject.org/. Accessed 02 Jul 2020.
 31.
Steensma DP, Kantarjian HM. Impact of cancer research bureaucracy on innovation, costs, and patient care. J Clin Oncol. 2014;32:376–8.
 32.
Ding M, Rosner GL, Müller P. Bayesian optimal design for phase II screening trials. Biometrics. 2008;3:886–94.
 33.
Erdmann, S. drugdevelopR: prior. https://web.imbi.uniheidelberg.de/prior/. Accessed 02 Jul 2020.
 34.
Dallow N, Best N, Montague TH. Better decision making in drug development through adoption of formal prior elicitation. Pharm Stat. 2018;17:301–16.
 35.
O'Hagan A, Buck CE, Daneshkhah A, Eiser JR, Garthwaite PH, Jenkinson DJ, et al. Uncertain judgements: eliciting experts' probabilities. Chichester: Wiley; 2006.
 36.
Devilee JLA, Knol AB. Software to support expert elicitation: an exploratory study of existing software packages; 2011.
 37.
DiMasi JA, Grabowski HG, Vernon J. R&D costs and returns by therapeutic category. Drug Information J. 2004;38:211–23.
 38.
Adams CP, Brantner VV. Spending on new drug development. Health Econ. 2010;19:130–41.
 39.
Morgan S, Grootendorst P, Lexchin J, Cunningham C, Greyson D. The cost of drug development: a systematic review. Health Policy. 2011;100:4–17.
 40.
Preussler S, Kieser M, Kirchner M. Optimal sample size allocation and go/nogo decision rules for phase II/III programs where several phase III trials are performed. Biom J. 2019;61(2):357–78.
 41.
Preussler S, Kirchner M, Götte H, Kieser M. Optimal designs for multiarm phase II/III drug development programs. Statistics in Biopharmaceutical Res. 2019. https://doi.org/10.1080/19466315.2019.1702092.
 42.
Patel NR, Ankolekar S. A Bayesian approach for incorporating economic factors in sample size design for clinical trials of individual drugs and portfolios of drugs. Stat Med. 2007;26:4976–88.
 43.
Götte H, Kirchner M, Sailer MO, Kieser M. Simulationbased adjustment after exploratory biomarker subgroup selection in phase II. Stat Med. 2017;36:2378–90.
Acknowledgements
We thank the reviewer for their valuable comments, which improved the manuscript remarkably.
Funding
We would like to thank the Deutsche Forschungsgemeinschaft (DFG) for supporting this research by the research grant KI 708/2–1 (financing a statistical position of the Institute of Medical Biometry of the University Hospital Heidelberg). Furthermore, we would like acknowledge the financial support by the DFG within the funding program “Open Access Publishing”, by the BadenWürttemberg Ministry of Science, Research and the Arts and by RuprechtKarlsUniversität Heidelberg (payment of the publishing costs). The role of the funding bodies was of financial manner only, therefore they took no direct part in the analysis or in writing the manuscript. Open access funding provided by Projekt DEAL.
Author information
Affiliations
Contributions
SE, MKir, HG, MKie developed the proposed method. SE wrote the manuscript and associated software code. All authors read and approved the final manuscript.
Authors’ information
The first author of this manuscript (SE) changed her name from Stella Preussler to Stella Erdmann. Therefore, the (first) author of [1, 33, 40], [41] and this manuscript is the same.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests. HG has no conflict of interest with the subject matter of this manuscript while being an employee of Merck Healthcare KGaA.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file 1.
In the Additional file 1, an overview of formulas in program setups \( S\left({\hat{\theta}}_2^{s_1},,,{\hat{\theta}}_2^{s_2}\right) \),s_{1}, s_{2} = λ, a_{CI}, u (A0) and investigation of an alternative definition of program success is given (A1). Furthermore, more details and results of the application example when modelling different population structures in phase II and III (A2), when using a predefined minimal clinically relevant effect for phase III planning (A3), when using a budget constraint (A4), when skipping phase II (A5) and when using a linear function for modelling the gain (A6) are presented. The file Code.R includes the main function calls for generating the datasets and tables, using the R package drugdevelopR.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Erdmann, S., Kirchner, M., Götte, H. et al. Optimal designs for phase II/III drug development programs including methods for discounting of phase II results. BMC Med Res Methodol 20, 253 (2020). https://doi.org/10.1186/s1287402001093w
Received:
Accepted:
Published:
Keywords
 Optimization
 Drug development program
 Bias adjustment
 Assurance
 Probability of success
 Sample size
 Software