An adaptive gBOIN design with shrinkage boundaries for phase I dose-finding trials

Background With the emergence of molecularly targeted agents and immunotherapies, the landscape of phase I trials in oncology has been changed. Though these new therapeutic agents are very likely induce multiple low- or moderate-grade toxicities instead of DLT, most of the existing phase I trial designs account for the binary toxicity outcomes. Motivated by a pediatric phase I trial of solid tumor with a continuous outcome, we propose an adaptive generalized Bayesian optimal interval design with shrinkage boundaries, gBOINS, which can account for continuous, toxicity grades endpoints and regard the conventional binary endpoint as a special case. Result The proposed gBOINS design enjoys convergence properties, e.g., the induced interval shrinks to the toxicity target and the recommended dose converges to the true maximum tolerated dose with increased sample size. Conclusion The proposed gBOINS design is transparent and simple to implement. We show that the gBOINS design has the desirable finite property of coherence and large-sample property of consistency. Numerical studies show that the proposed gBOINS design yields good performance and is comparable with or superior to the competing design.


Introduction
In oncology phase I trial studies, one main objective is to determine the maximum tolerated dose (MTD) or the recommended phase II dose (RP2D). Targeting on pahse I trial studies, numerous methods have been proposed and can be generally classified into three classes: algorithm-based design like the 3+3 design [1], the accelerated titration design [2], and the biased coin design [3]; model-based design like the continual reassessment method (CRM) [4,5] and its various extensions [6][7][8]; and recently developed model-assisted design like the Bayesian optimal interval (BOIN) design [9], and the Keyboard design [10]. Note that, all these methods accounting for the binary toxicity outcomes, experienced dose-limiting scores, e.g., binary, continuous, count, in a unified framework except a design by [20] and a gBOIN design by [14]. The design by [20] is an algorithm-based design and the gBOIN is a model-assisted design, which is a generalized version of the BOIN design by [9] to account for various toxicity grades. This paper will take further steps to extend the gBOIN. The gBOIN design assumes two fixed boundaries φ 1 and φ 2 , by which the dose transition was conducted. Though gBOIN with fixed boundaries enjoys the desirable performance in finite sample size in the previous study by [14], it leaves the potential for us to improve the performance of gBOIN by exploring its behaviors with non-fixed boundaries. The rationale of studying the non-fixed boundaries is straightforward, since different boundaries are associated with different risks of making wrong dose allocations. In this article, we propose a gBOINS design method, which generalized the gBOIN method with two shrinkage boundaries φ * 1 and φ * 2 . These two boundaries are obtained based on the theory of the uniformly most powerful Bayesian test [21]. The trial will be guided by replacing the two fixed boundaries φ 1 and φ 2 in the gBOIN with φ * 1 and φ * 2 , respectively. We show that, in contrast to the gBOIN design which will oscillate among the doses within the equivalent interval, the new proposed gBOINS design has the ideal large-sample behavior that converges to one of the dose levels within the equivalent interval, because its decision boundaries shrink to a point mass toward the target. This distinctive feature of gBOINS provides a theoretical foundation and guarantees the MTD convergence. Numerical studies also show that: for small sample size, the gBOINS yields good performance that is comparable or superior to its raw version gBOIN; for large sample size, compared to the gBOIN design, the performance of gBOINS has a substantial improvement.
Remainder of the paper is organized as follows. In "Method" section, after a brief introduction of gBOIN design we introduce the gBOINS design, its theoretical foundation and derive its properties. In "Simulation" section, we compare the gBOINS design to the gBOIN design with various types of toxicity grades. In "Conclusion" section, we conclude the paper with a discussion.

Introduction of gBOIN design
Assume there are J specified doses d 1 < · · · < d J under investigation. Let y denote the toxicity outcome which is either binary or quasi-binary (e.g., DLT or ETS) or continuous (e.g., TTB, TBS or TTP). For the motivating trial, after an appropriate transformation, we take the AUC as a continuous end point and model it by a normal distribution. [14] adopted the binomial and the normal distributions for binary (or quasi-binary) and continuous endpoints, respectively. Define μ = E(y) and μ j = E(y|d j ).
Given the dose d j , the distribution of y belongs to the exponential family, where, , if y follows a normal distribution.
Let φ 0 denote the target value of μ for dose finding. Specifically, for binary or quasi-binary toxicity endpoints, φ 0 is the target DLT probability; for continuous endpoints, φ 0 is the targeted value of the TTB, TBS or TTP. Assume there are n j patients treated at dose level d j and let D j = (y 1 , · · · , y n j ) denote the observed toxicity data. Based on D j , the sample mean can be obtained asμ j = n j i=1 y i /n j . For the interval-based design, dose transition decisions are made by comparingμ j with the decision boundaries, λ e (d j , n j , φ 0 ) and λ d (d j , n j , φ 0 ). Specifically, if μ j < λ e (d j , n j , φ 0 ), escalate to the higher dose level j + 1, and ifμ j > λ d (d j , n j , φ 0 ), de-escalate to the lower dose level j − 1, otherwise retain the same dose level j. The selection of the decision boundaries λ e (d j , n j , φ 0 ) and λ d (d j , n j , φ 0 ) is critical because these two parameters essentially determine operating characteristics of a design. Let the decisions retainment, escalation and de-escalation (each based on the current dose level), denoted as R, E and D, respectively and let R denote the decisions that are complementary to R (i.e., R includes E and D ), and E and D denote the decisions that are complementary to E and D, respectively. Following the same rule of [9], to obtain optimal decision boundaries under some criteria, the gBOIN [14] considers three point hypotheses H 0 : μ j = φ 0 , H 1 : μ j = φ 1 , H 2 : μ j = φ 2 and minimize an incorrect decision probability α, where φ 1 is a value deemed subtherapeutic such that dose escalation is warranted, and φ 2 is a value deemed overly toxic such that dose de-escalation is required. Note that, H 0 indicates that the current dose is the MTD and we should retain the current dose for the next cohort of patient; H 1 indicates that the current dose is below the MTD and we should escalate the dose; and H 2 indicates that the current dose is overly toxic and we should deescalate the dose. Thus, the correct decisions under hypotheses H 0 , H 1 and H 2 are retainment, escalation and de-escalation. Correspondingly, the incorrect decisions under H 0 , H 1 and H 2 are R, E and D, respectively. For example, under H 0 (i.e., the current dose is the target), the correct decision is to retain the current dose (i.e., R), and incorrect decisions are dose escalation and de-escalation (i.e., E and D). Taking a noninformative prior, i.e., P(H 0 ) = P(H 1 ) = P(H 2 ) = 1/3, and minimizing the incorrect decision probability α in Eq. (2), the decision boundaries can be obtained as ( details can be found in [14]), Specifically, when y follows a Bernoulli or quasi-Bernoulli distribution, we have which are exactly the same as boundaries provided by the original BOIN design [9]. When y follows a normal distribution, we have Based on the above decision boundaries, the gBOIN design is summarized as follows: (a) Patients in the first cohort are treated at the lowest dose level or at a prespecified dose level. (b) At the current dose level j, assign a dose to the next cohort of patients, • ifμ j ≤ λ * e , escalate the dose level to j + 1, • ifμ j ≥ λ * d , de-escalate the dose level to j − 1, and • otherwise, i.e., λ * e <μ j < λ * d , retain the same dose level, j.
(c) This process is continued until the maximum sample size is reached or the trial is terminated because of excessive toxicities.
It is remarkable that the optimal decision boundaries λ * e , λ * d are free of d j and n j , which means that the same pair of boundaries are used throughout the trial no matter which dose is the current dose, nor how many patients have been treated at the current dose.

Adaptive gBOIN design
Extensive simulation studies have shown that the gBOIN is transparent and simple to implement, and it yields good performance that is comparable or superior to more complicated model-based designs. As we described in the "Introduction" section, the un-fixed boundaries may allow a flexibility to penalize mis-allocation rate of patients at over-toxic doses. To account for un-fixed boundaries, firstly, we reformulate the above three hypotheses as follows, In the Bayesian paradigm, the Bayes factor in favor of the alternative hypothesis H 1 against a fixed null hypothesis H 0 is defined as, and the null hypothesis H 0 is rejected if BF 10 (D j ) exceeds a prespecified threshold γ 1 . Similarly, the Bayes factor in favor of the alternative hypothesis H 2 against a fixed null hypothesis H 0 is defined as, and the null hypothesis H 0 is rejected if BF 20 (D j ) exceeds a prespecified threshold γ 2 . Note that, if we want to put more penalties on over-toxic allocation, values of γ 1 and γ 2 would be different and presumably γ 1 should be greater than γ 2 since smaller γ 2 means decisions of de-escalation are easier made if over-toxicities occur. Given the prior odds P(H k )/P(H 0 ) = 1 and the threshold γ k , (k = 1, 2), we can determine an alternative hypothesis that maximize the probability that the Bayes factor forms a test exceed the specified threshold γ k . In other words, here we can choose the value of φ * k , (k = 1, 2) (this notation has been introduced in the "Introduction" section) to maximize P BF k0 D j > γ k .
By the Lemma 1 of [21], φ * 1 and φ * 2 can be obtained by, Specifically, for binomial distribution, φ * 1 and φ * 2 can be given as, Obviously, the values of φ * 1 and φ * 2 depend on the target φ 0 , the sample size n j and the threshold γ k , k = 1, 2. Although their close forms cannot be obtained, they can be solved via numerical optimization methods. For normal distribution, φ * 1 and φ * 2 can be given as, Note that, for the normal distribution, values of φ * 1 and φ * 2 depend on the value of σ . So, if σ is unknown, we can replace it with its sample estima- can take an Inverse Gamma distribution with shape parameter α 0 and rate parameter β 0 as its prior, then σ can be replaced by using its posterior mean we can get the adaptive shrinkage decision boundaries λ * e (n j ) and λ * d (n j ). Note that, for a standard binary toxicity endpoint, if we take the same values for γ k , k = 1, 2, the boundaries are the same as the UMPBI design [22]. Based on Lemma 2 in [21], we have the following double-shrinkage property theorem about the shrinkage boundaries λ * e (n j ) and λ * d (n j ).
Theorem 1 As n j → ∞, the decision boundaries λ * e (n j ) and λ * d (n j ) will converge to the target φ 0 at the rate of O log(γ 1 )/n j and O log(γ 2 )/n j respectively.
Theorem 1 introduces a double-shrinkage property for the proposed adaptive gBOIN design: The optimal values φ * k shrink toward the target toxicity probability φ 0 , and the optimal boundaries λ * e (n j ) and λ * d (n j ) based on each combination of φ * 1 and φ * 2 shrinkage toward the target value φ 0 . Now we give the procedure of the proposed gBOINS design as follows.
(a) Patients in the first cohort are treated at the lowest dose level or at a prespecified dose level. (b) At the current dose level j, to assign a dose to the next cohort of patients, , de-escalate the dose level to j − 1, and • otherwise, i.e., λ * e (n j ) <μ j < λ * d (n j ), retain the same dose level, j.
(c) This process is continued until the maximum sample size is reached or the trial is terminated because of excessive toxicities.
After the trial has been completed, we use the pooled adjacent violators algorithm [23] to select a dose level as the MTD. Denote the isotonically transformed values of the observed value {μ j } by {μ j }, to be specific, for finding the MTD, we select dose j * , for which the isotonic estimate of the toxicity rateμ j * is closest to φ 0 ; if there are ties for μ j * , we select from the ties the highest dose level wheñ μ j * < φ 0 or the lowest dose level whenμ j * > φ 0 .
For patient safety, we impose the following overdose control rule when using the gBOIN design.
If P μ j > φ 0 D j > 0.95 and n j ≥ 3, dose levels j and higher are eliminated from the trial, and the trial is terminated if the first dose level is eliminated.
Posterior probability P μ j > φ 0 D j > 0.95 can be evaluated on the basis of a beta-binomial model for the binary or quasi-binary endpoint, assuming μ j follows a vague beta prior, e.g., μ j ∼ beta (1, 1). For normal endpoint y with mean μ j and variance σ 2 j , assuming noninformative prior (μ, σ 2 j ) ∝ σ −2 , the posterior distribution of μ j follows a t distribution with n j − 1 degrees of freedom,

Design properties
From a practical viewpoint, a natural requirement for dose-finding trials is that dose escalation should be not allowed if the observed toxicity rate or mean toxicity score at the current dose is higher than the target, and dose deescalation should not be allowed if the observed toxicity rate or mean toxicity score at the current dose is lower than the target. [9] referred to this finite sample property as "long-term memory coherent", which is an extension of a similar concept originally proposed by [24]. That original definition of design coherence requires the prohibition of dose escalation (or de-escalation) when the observed toxicity rate in the most recently treated cohort is more (or less) than the target toxicity rate. Because that definition is based on the response from only the most recently treated cohort without considering responses from patients who were previously enrolled and treated, [9] refers this definition as "short-term memory coherence". Clearly, shortterm memory coherence is a stronger counterpart than long-term memory coherence. As shown in the Appendix, the gBOINS design has the following desirable finite-sample property.

Theorem 2
The gBOINS design is long-term memory coherent in the sense that the design will never escalate the dose whenμ j > φ 0 ; and will never de-escalate the dose whenμ j < φ 0 .
To further enhance safety of the design, we let the upper boundary φ * 2 have a little bit faster shrinking rate than that  of the lower boundary φ * 1 , since more strict or smaller φ * 2 has less risk of exposing participated patients to over-toxic doses. We propose to take γ k as γ k = exp(c k n . It can be shown that the proposed adaptive gBOIN design has the following desirable large-sample property. According to Theorem 1, the condition γ k = exp(c k n ε k j ), c k > 0, k = 1, 2, imposed here to leverage the converge rate of λ * e (n j ) and λ * d (n j ), yielding P{μ j ∈ (λ * e (n j ), λ * d (n j ))} = 1, becauseμ j converges in probability to μ j at the √ n rate. Following the proof of Theorem 1 of [25], the result can be directly obtained and is omitted here.

Practical implementation
To implement the proposed gBOINS design in practice, we need to specify the values of k and c k , k = 1, 2. We recommend the k = 0.5, k = 1, 2. The values of c k , k = 1, 2 need to be calibrated by extensive simulation studies, and even there are no uniform values for different type of endpoints with the same target. For the normal endpoints, the shrinkage boundaries depend on the estimate of σ , this will influence the pre-tabulation and the simplicity of gBOINS. For practical applications, we suggest to replace it with 1.1φ 0 . Note that a big (small) value of λ * e (n j ) (or λ * d (n j )) will make dose escalation (de-escalation) rapidly, this may lead serious safety problems and reduce the efficiency of the design when the sample size is small, since the smaller value of the sample size the bigger variance ofμ j . To avoid this adverse event problem and improve the design's efficiency, in practice, we introduce a lead-in process in a trial to follow the original gBOIN design for a pre-specified number of patients (denoted as N 0 ). After n j > N 0 , the trial is then switched to the gBOINS design. For our simulations, N 0 = 6 is recommended. Table 1 shows examples of the values of (λ * e (n j ), λ * d (n j )) for target φ 0 = 0.2 and φ 0 = 0.3.

Toxicity as a binary endpoint
We test the performance of the gBOINS design by comparing it to the gBOIN design under four different metrics: the percentage of correct selection (PCS) of the MTD, the average number of patients allocated to the MTD, the risk of overdosing, which is defined as the percentage of simulated trials in which a large percentage (e.g., more than 60% or 80%) of patients are treated at doses above the MTD and the risk of underdosing which is defined as the percentage of simulated trials in which more than 80% of patients are treated at doses below the MTD. We investigated two target toxicity rates φ 0 = 0.2 and φ 0 = 0.3, and for each of the target toxicity rate, we examined 16 representative toxicity scenarios with various parameters of φ 0 = 0.2, c 1 = log(1.05)/3 and c 2 = log(1.05)/3, and c 1 = log(1.1)/3 and c 2 = log(1.1)/3 when φ 0 = 0.3. Table 2 which were reproduced from [27]. All examined scenarios are varied in the location of the MTD and the gaps around the MTD. For each scenario, 30 patients and 10 cohorts were assumed. Figures 1 and 2 present the results based on 4000 simulated trials. As shown in Fig. 1, when the target is 0.2, for all 16 scenarios the performance of gBOINS and gBOIN are comparable in the sense of percentage of correct selection of the target dose and the average number of patients allocated to the MTD. While the gBOIN has a higher risk of overdosing, for most scenarios, acromm scenarios 1 to  10. In addition, compared to the gBOIN design, the proposed gBOINS allocated fewer patients to sub-therapeutic doses for most scenarios, which may be explained by a higher risk of underdosing 80% for the gBOIN. Figure 2 shows that, when the target is 0.3, the performance of gBOINS and gBOIN are comparable, and the proposed gBOINS has a lower risk of overdosing.
At the end of the "Toxicity as a binary endpoint" section, we also conducted simulation studies to investigate the performance of the gBOINS with respect to different sample sizes. We consider four scenarios in Table 3 and the simulation results based on 4000 replications are presented in Fig. 3. As shown in the first two pictures on the left panel of Fig. 3, for scenarios 1 and 3, there was

Toxicity as a quasi-binary endpoint
We evaluated the performance of the gBOINS design when the toxicity endpoint was a quasi-binary endpoint (e.g., ETS) by comparing it to the gBOIN design method based on the ten scenarios considered by [14](see Table 4). Following [18], we adopted the following ETS definition: grades 0 and 1 were of no concern (no DLT); a grade-2 toxicity was equivalent to 0.5 DLT; a grade-3 toxicity counted as one DLT; and a grade-4 toxicity was equivalent to 1.5 DLT. The target ETS was 0.47, derived from the target profile of 49% grade 0 and grade 1, 18% grade 2, 23% grade 3, and 10% grade 4. That is, the target ETS = 0.49 × 0 + 0.18 × 0.5 + 0.23 × 1.0 + 0.10 × 1.5 = 0.47. A total sample of 30 patients in 10 cohorts was used in the   Table 5 shows the results based on 4,000 simulated trials. In general, for Scenarios 1, 2, 3, 4, 5 and 8, the performance of gBOINS are comparable with the gBOIN design in terms of PCS, number of patients allocating to the MTD, while the gBOIN assigns more patients to the overly toxic doses above the MTD. For example, in scenario 3, in which dose level 2 was the MTD, the gBOINS design yielded a PCS of 48% and allocated 14.5 patients to the MTD; the gBOIN yielded a PCS of 47% and allocated 12.1 patients to the MTD. While the gBOINS assigned 2.7 fewer patients than the gBOIN design to the overly toxic doses. In scenario 6, in which the MTD was dose level 1, the PCS of the gBOINS was 77% and has a 8% higher than that of the gBOIN design. In scenario 7, the MTD was located at the lose level 2 and the gBOIN design yielded a of 55%, which was 4% higher than that of the gBOINS. While the gBOINS allocated more patients to the MTD and assigned fewer patients to the overly toxic doses. In scenario 9, in which dose level 2 was the MTD, the gBOINS design yielded a PCS of 96% and was 5% higher than that of the gBOIN design. The number of patients allocated to MTD was similar, while the gBOINS assigned 3.4 fewer patients than the gBOIN design to the overly toxic doses. In scenario 10, in which dose level 4 was the MTD, the gBOINS design yielded a PCS of 76% and was 5% higher than that of the gBOIN design. The gBOIN design assigned 13.3 patients to MTD and was Table 4 True probability of each toxicity grade (0/1, 2, 3, 4) at each dose level (1-6) for ten simulation scenarios (1-10) [18]  higher than that of the gBOINS, while gBOINS assigned fewer patients to the overly toxic doses. In addition, the gBOINS design yielded a 6% chose the overly toxic doses as MTD and had a substantially improvement compared with the gBOIN design which has a 26% chose the overly toxic doses as the MTD.

Continuous outcomes
In this section, we consider ten scenarios with continuous outcomes in Table 6, all responses follow the normal distribution adopted by [20]. For the first six scenarios, the response y at the dose level x ∈ {1, 2, 3, 4, 5, 6} follows a normal distribution N(0.05 + 0.05x, 0.05 2 x 2 ) and when the target at x = 1, a sample size of 15 was used and when the target at other dose levels, a sample size of 60 was used. Cohort size of 1 was used for all scenarios. For the rest scenarios, scenarios 7 to 10, the response y at the dose x ∈ {1, 2, 3, 4, 5, 6} also followed the normal distribution N(0.05 + 0.05x, 0.05 2 x 2 ), and a moderate large sample size of 100 was used. And the c k , k = 1, 2, are set to be c 1 = log(1.1)/3 and c 2 = log(1.1) throughout this subsection. Table 6 shows that when the sample size is 15 for scenario 1 and 60 for scenarios 2 to 6, performance of the gBOINS design is comparable with the gBOIN design, in correct selection percentage and number of patients treated at the target dose. While for the last four scenarios, when the sample size is moderate large, the gBOINS outperformed the gBOIN design in correct selection percentage and was comparable with gBOIN design in number of patients allocated to the target dose. Specifically, in scenario 7, in which dose level 3 was the target dose, gBOINS yielded a PCS of 89% and allocated 61.7 patients to the MTD, whereas the gBOIN yielded a PCS of 84% and allocated 64 patients to the MTD. In scenario 8, the MTD was located at the dose level 4, the PCS of gBOINS was 75% and 5% higher than that of the gBOIN. In scenarios 9-10, we also see that the PCS of gBOINS was superior to the gBOIN in PCS and was comparable with the gBOIN in the number of patients allocated to the MTD.

Conclusion
We proposed a new phase I trial design that incorporates toxicity grades into the dose finding trials. The proposed gBOINS design unifies the continuous and quasi-binary toxicity endpoints as well as the standard binary endpoint. Different from the gBOIN design, the decision boundaries of gBOINS design, λ * e (n j ) and λ * d (n j ) were adaptive, which provides the statistician a flexible tool to control the over-toxicities. The design can also converge to the target toxicity probability as the sample size goes to infinity. This unique convergence property of gBOINS was demonstrated both theoretically and numerically. Compared to the gBOIN design, when there were more than one doses lying inside the decision boundaries λ * e , λ * d determined by the gBOIN design, the gBOINS had a substantial improvement in terms of the PCS when the sample size was moderate large. Also, we showed that when the sample size was small, the performance of gBOINS design was comparable with the gBOIN design in terms of the PCS and can allocate more patients to safe doses by simulations.
Although the prosed gBOINS design focus on phase I trial designs, similarly to the BOIN-ET proposed by [28], it can be directly extended to the phase I/II designs. One limitation of the gBOINS design is that it assumes toxicity outcome can be observed quickly enough to make the dose assignment decisions for each enrolled cohort. One approach to extend the gBOINS design to accommodate late-onset or delayed outcomes, for example, would be to use the Bayesian data augmentation approach [29,30] or the approximated likelihood approach [31]. This is a topic of our future research.