 Research
 Open Access
 Published:
IBIS: identify biomarkerbased subgroups with a Bayesian enrichment design for targeted combination therapy
BMC Medical Research Methodology volume 23, Article number: 66 (2023)
Abstract
Background
Combination therapies directed at multiple targets have potentially improved treatment effects for cancer patients. Compared to monotherapy, targeted combination therapy leads to an increasing number of subgroups and complicated biomarkerbased efficacy profiles, making it more difficult for efficacy evaluation in clinical trials. Therefore, it is necessary to develop innovative clinical trial designs to explore the efficacy of targeted combination therapy in different subgroups and identify patients who are more likely to benefit from the investigational combination therapy.
Methods
We propose a statistical tool called ‘IBIS’ to Identify BIomarkerbased Subgroups and apply it to the enrichment design framework. The IBIS contains three main elements: subgroup division, efficacy evaluation and subgroup identification. We first enumerate all possible subgroup divisions based on biomarker levels. Then, Jensen–Shannon divergence is used to distinguish highefficacy and lowefficacy subgroups, and Bayesian hierarchical model (BHM) is employed to borrow information within these two subsets for efficacy evaluation. Regarding subgroup identification, a hypothesis testing framework based on Bayes factors is constructed. This framework also plays a key role in go/nogo decisions and enriching specific population. Simulation studies are conducted to evaluate the proposed method.
Results
The accuracy and precision of IBIS could reach a desired level in terms of estimation performance. In regard to subgroup identification and population enrichment, the proposed IBIS has superior and robust characteristics compared with traditional methods. An example of how to obtain design parameters for an adaptive enrichment design under the IBIS framework is also provided.
Conclusions
IBIS has the potential to be a useful tool for biomarkerbased subgroup identification and population enrichment in clinical trials of targeted combination therapy.
Background
In recent years, the rapid development of targeted combination therapy has brought novel treatment options for cancer patients. For example, atezolizumab (PDL1 inhibitor) plus bevacizumab (VEGF inhibitor) could maintain clinically meaningful survival benefits compared with sorafenib in patients with unresectable hepatocellular carcinoma [1]. Firstline treatment with nivolumab (PD1 inhibitor) plus ipilimumab (CTLA4 inhibitor) resulted in a longer duration of overall survival than did chemotherapy in patients with advanced nonsmallcell lung cancer [2]. A singlearm, phase IbII trial of pembrolizumab (PD1 inhibitor) plus trastuzumab (HER2 inhibitor) also demonstrated activity and durable clinical benefit in patients with PDL1positive, trastuzumabresistant, advanced, HER2positive breast cancer [3]. Such combination therapies are directed at multiple therapeutic targets and may improve treatment response, prevent development of resistance, or reduce adverse events. However, the efficacy of targeted combination therapy could be heterogeneous across subgroups and is generally related to the levels of certain predictive biomarkers [4]. Traditional treatment strategy without selecting population is no longer desirable. Compared to monotherapy, subgroup identification is usually more complicated for targeted combination therapy due to the increasing number of subgroups. For example, patients treated with a PD1 inhibitor are often divided into three subgroups, including PDL1 less than 1, 1–49%, and 50% or greater, while patients treated with a HER2 inhibitor can be divided into HER2positive and HER2negative subgroups. Thus, if these two kinds of therapies are combined for treatment, there are consequently a total of six subgroups. This may result in insufficient sample sizes and slow recruitment for some subgroups, leading to efficacy evaluation challenges in clinical trials. One motivation example of this paper is an ongoing phase 1b, openlabel, 2part, multicenter, nonrandomized, multipledose study which evaluates DS8201a in combination with pembrolizumab in participants with advanced/metastatic breast cancer or nonsmall cell lung cancer (ClinicalTrials.gov Identifier: NCT04042701) [5]. DS8201a is an antiHER2 antibodydrug conjugate (ADC) with a novel topoisomerase I inhibitor and pembrolizumab is a PD1 inhibitor. Therefore, it is highly likely that the efficacy of the drug combination is related to patients’ expression levels of HER2 and PDL1. The dose expansion part of the study includes breast cancer patients with both HER2positive and HER2lowpositive, and the inclusion criteria does not limit the expression level of PDL1. Therefore, although the primary objective of this example is not exactly to identify subgroups based on these two predictive biomarkers, we regard it as a scenario where a datadriven subgroup identification is possible.
Moreover, even if the target populations of both single drugs have been identified through historical studies, we cannot claim that the target population of the combination therapy is simply the intersection of those two populations because combination therapy can potentially enhance efficacy and reduce drug resistance by targeting multiple key pathways in a synergistic or an additive manner [6]. Unlike monotherapy, where there is usually a monotonic relationship between efficacy and biomarker level, the efficacy profiles of combination therapy across subgroups could become more complicated due to the existence of interaction effects. Therefore, it is necessary to develop new innovative clinical trial designs to explore the efficacy of targeted combination therapy in different subgroups and identify patients who are more likely to benefit from the investigational combination therapy.
Another recent change in medical practice is the increasing refinement of biomarkerbased subgroup classification, representing a shift from dichotomy to multilevel classification. For example, patients with breast cancer are usually divided into HER2positive and HER2negative subgroups in clinical practice. For HER2positive patients, trastuzumabbased or other HER2targeted drug regimens are now standards of care [7]. However, recent studies have shown that HER2lowpositive and HER2zero breast cancers, although generally classified as HER2negative, are distinct in terms of prognosis and response to treatment [8]. Preclinical studies of DS8201, an antiHER2 ADC, indicate that the antitumor activity of the drug is dependent on HER2 expression level rather than on HER2 amplification [9]. An earlyphase clinical trial showed that the drug has a certain effect on breast cancer patients with IHC2+ and IHC1+, while IHC1+ is generally classified as HER2negative [10]. This evolving paradigm of subgroup classification is more consistent with the concept of precision medicine, but it may further increase the number of subgroups, causing challenges with respect to subgroup identification and efficacy evaluation. As the treatment effects may be similar in adjacent biomarkerbased subgroups (e.g., IHC2+ and IHC1+), one possible strategy to address this issue is to borrow information across similar subgroups. Therefore, we propose a tool called ‘IBIS’ in this article to detect the potential similarities across subgroups and Identify BIomarkerbased Subgroups with higher efficacy.
We also extend IBIS to an adaptive enrichment design framework to increase its applicability. The adaptive enrichment design can adjust the inclusion/exclusion criteria according to a prespecified plan based on the results of interim analysis, allowing the flexibility to explore the efficacy of investigational drugs for different subgroups. Several adaptive enrichment designs have been proposed, some of which take into account the case of a single dichotomous biomarker [11,12,13,14,15,16,17,18,19,20,21]. In these trial designs, patients regardless of marker status are enrolled at stage I. Then, an interim analysis is performed to decide whether to continue enrolling the entire population or to enroll only biomarkerpositive patients. Some other studies considered a more general case of nested subgroups and focused on subgroup selection, assuming multiple prespecified subgroups with or without a priori ordering [22,23,24,25,26]. However, when it comes to two or more predictive biomarkers, the efficacy of the drug combination may be partially ordered. For example, we can assume the efficacy of PD1 inhibitors in patients with PDL1 ≥ 50% will not be lower than that of patients with PDL1 < 50%; the efficacy of HER2 inhibitors in patients with HER2positive will not be lower than that of patients with HER2negative. However, we cannot judge whether a combination of PD1 inhibitor and HER2 inhibitor is better for patients with PDL1 ≥ 50% and HER2negative than for patients with PDL1 < 50% and HER2positive. Therefore, those existing designs with or without a priori ordering for subgroups cannot completely meet the requirements of subgroup identification with multiple predictive biomarkers. To the best of our knowledge, there are limited systematic studies on the issues of biomarkerbased subgroup identification and population enrichment in terms of targeted combination therapy. There are also few studies considering similarities across adjacent subgroups. Considering the increasing number of subgroups and complicated biomarkerbased efficacy profiles, it is of great significance to propose new design methods to identify subgroups and enrich populations for targeted combination therapy.
Methods
IBIS design
Subgroup division
Considering a clinical trial whose primary objective is to determine whether a twoagent targeted drug combination (e.g., pembrolizumab plus trastuzumab) is effective for some specific subgroups, we first divide the subgroups into highefficacy and lowefficacy subsets. This type of division could be an issue with the refinement of biomarkerbased subgroup classification. It is assumed that two corresponding biomarkers are incorporated, denoted as Biomarker1 and Biomarker2 (e.g., PDL1 and HER2). The entire population can be divided into K ordered subgroups based on Biomarker1 or J ordered subgroups based on Biomarker2; thus the total number of subgroups is K × J. Assuming that the efficacy of both targeted agents increases monotonically with biomarker levels, i.e., marginal monotonicities, a total of G highefficacy subsets Π_{g}(g = 1, …, G) can be listed, where the G_{th} subset Π_{G} represents the entire population. Each highefficacy subset has at least one highefficacy patient subgroup, i.e., Π_{g} ≠ ∅. Taking one of the simplest forms (K = 2, J = 2) as an example, there are five possible subgroup divisions (G = 5, Fig. 1A).
As the number of subgroups increases, the possible situations of subgroup division expand rapidly. For example, when K = 3 and J = 4, there are 34 possible divisions altogether, four among which are shown in Fig. 1B. A computer algorithm can be used to enumerate all possible divisions satisfying the marginal monotonic assumption. The algorithm is given below.

(1)
Let \(r\left(k,j\right)\) denote the variable indicating the magnitude of efficacy for subgroup\(\left(k,j\right)\), where k = 1,…, K and j = 1,…, J. To describe the algorithm more conveniently, let \(r\left(k,0\right)=r\left(0,j\right)=0\).

(2)
Use the following loop to sequentially assign values to \(r\left(k,j\right)\):

FOR k = 1,…, K

FOR j = 1,…, J

DO \(r\left(k,j\right)=\text{runif}\left(\text{max}\left(r\left(k1,j\right),\;r\left(k,j1\right)\right),1\right)\)

where \(\text{runif}\left(a,b\right)\) denotes a uniformly distributed random number between a and b, and \(\text{max}\left(a,b\right)\) denotes the larger of 𝑎 and 𝑏.


(3)
Sort \(r\left(k,j\right)\) to get an ordering.

(4)
Repeat steps 2 and 3 to get 𝑁 orderings (𝑁 is a large number, such as 10^{6}). Eliminate duplicates in these orderings.

(5)
Partition the orderings to obtain possible situations of the highefficacy subset.

(6)
Eliminate duplicates in the obtained divisions without regard to the orderings within subset.
Efficacy evaluation
For ease of elucidation, we first consider the scenario of a singlearm trial. Let Y_{(k, j)} denote the efficacy measure outcome for patients in subgroup (k, j), which follows a oneparameter exponential family distribution, i.e., Y_{(k, j)}~f(ψ_{(k, j)}). For example, if the response rate is the efficacy endpoint, whether a patient in subgroup (k, j) responds to investigational treatment can be viewed as following a Bernoulli distribution with probability ψ_{(k, j)}. As the treatment effects are relatively similar within the highefficacy and lowefficacy subsets, we transform the original parameter ψ_{(k, j)} into an exchangeable parameter θ_{(k, j)} = h(ψ_{(k, j)}) in preparation for borrowing information via a hierarchical model. A typical example of the transform function h(∙) is the logit function for a binary endpoint. Table 1 shows the situations for some other commonly used endpoints.
One simple way to model the efficacy outcome is applying Bayesian hierarchical models to borrow information across all subgroups. However, when the heterogeneity across subgroups is large, using this strategy may lead to substantial bias. Therefore, it is more preferable to classify subgroups into two subsets based on accumulated data and then borrow information within each subset. Let Π_{∁g} (lowefficacy subset) denote the complementary set for subset Π_{g} (highefficacy subset). A Bayesian hierarchical model is constructed as follows to borrow information within the highefficacy and lowefficacy subsets:
where θ_{g} and θ_{∁g} denote the average treatment effects for highefficacy and lowefficacy subsets, respectively. Shrinkage parameters \({\sigma}_g^2\) and \({\sigma}_{\complement g}^2\) are the intersubgroup variances of treatment effects within these two subsets, controlling the degree of information borrowing. They do not need to be specified in advance and can be datadriven. If treatment effect estimates across subgroups within a subset are similar, then the posteriors of the intersubgroup variance will be smaller, inducing a strong borrowing. If treatment effect estimates across subgroups within a subset are very different, then less borrowing will occur. Normal distributions with large variances are usually taken as the priors for θ_{g} and θ_{∁g}. In terms of the priors for \({\sigma}_g^2\) and \({\sigma}_{\complement g}^2\), an inversegamma distribution IG(a, b) can be adopted. Small values of a and b are set such that the priors are vague. We constrain θ_{g} > θ_{∁g} to avoid the potential computational issue of label switching when using the Gibbs sampler to sample posteriors.
To distinguish highefficacy and lowefficacy subgroups, we use Jensen–Shannon divergence [27] to measure the distance between the two posterior distributions of the average treatment effect θ_{g} and θ_{∁g}, which is also a measure of the similarity between highefficacy subset and lowefficacy subset:
where \(\overset{\sim }{\theta }=\frac{1}{2}\left({\theta}_g+{\theta}_{\complement g}\right)\). D_{KL}(A‖B) denotes the Kullback–Leibler divergence between A and B, which is defined as follows when A and B are both continuous variables:
where a(x) and b(x) are the probability densities of A and B, respectively. After calculating the Jensen–Shannon divergences for all subgroup divisions, the optimal division result is defined as the division that maximizes the Jensen–Shannon divergence between θ_{g} and θ_{∁g}, because that is when Π_{g} and Π_{∁g} are most dissimilar. Let C_{H} denote the highefficacy subset in the optimal division:
Based on this optimal division, the posterior distribution of the treatment effect for each subgroup θ_{(k, j)} can be obtained by applying the aforementioned Bayesian hierarchical model. We use the Jensen–Shannon divergence here because it is based on the wellknown Kullback–Leibler divergence, and it has the property of symmetry. Some other measures of distance between distributions, such as the Hellinger distance [28], may also be applicable.
The model introduced above can be easily extended to randomized controlled trials (RCTs). The treatment effects of the investigational drug and the control intervention for subgroup (k, j) are \({\theta}_{\left(k,j\right)}^T\) and \({\theta}_{\left(k,j\right)}^C\), respectively. Therefore, the effect size of interest \({\theta}_{\left(k,j\right)}={\theta}_{\left(k,j\right)}^T{\theta}_{\left(k,j\right)}^C\), and the above statistical model can still be applied.
Subgroup identification
The objective of subgroup identification is to find a collection of subgroups with clinically meaningful treatment effects. Suppose the hypothesis is as follows:
In singlearm trials, θ_{0} represents the minimum acceptable treatment effect, which is usually equal to the efficacy of the existing standard of care. In randomized controlled trials, θ_{0} is the superiority margin and is usually taken as 0.
Bayes factors are used to test the hypothesis in IBIS. If the Bayes factor BF_{(k, j)} corresponding to subgroup (k, j) is sufficiently large, i.e.,
the investigational combination therapy is considered to be effective for subgroup (k, j), where D denotes the accumulated trial data and BF_{E(k, j)} is a prespecified threshold for (k, j). Under the Bayesian paradigm, we assign each of the hypotheses a prior probability of being true, as denoted by Pr(H_{0(k, j)}) and Pr(H_{1(k, j)}). Correspondingly, the posterior probabilities are denoted as Pr(H_{0(k, j)} D) and Pr(H_{1(k, j)} D). To satisfy the assumption that efficacy increases monotonically with biomarker levels, once the investigational therapy is considered effective for subgroup \(\left(\overset{\sim }{k},\overset{\sim }{j}\right)\), it is also deemed effective for subgroups in subset \(C:\left\{k\ge \overset{\sim }{k},j\ge \overset{\sim }{j}\right\}\).
We do not directly judge whether the investigational therapy is effective for C_{H} (i.e., making inference on θ_{g} at the subset level) because this is a kind of ‘statistical’ subgroup division, not a ‘clinical’ one. In addition, such a subset can always be identified as mentioned before. If the therapy is ineffective for all subgroups, making such judgments will inevitably lead to type I errors; if the therapy is effective for all subgroups, it will exclude some subgroups and result in type II errors. The main purpose for subgroup division in IBIS is to reduce the bias of estimation generated by BHM when there is heterogeneity across subgroups rather than directly inferring and making decisions based on the division results.
The reason why we use the Bayes factors rather than directly using the posterior probability Pr(H_{1(k, j)} D) or the posterior odds Pr(H_{1(k, j)} D)/ Pr(H_{0(k, j)} D) to make decisions is mainly that when the intervals corresponding to the null and alternative hypotheses are not the same length, such as a point null hypothesis, the posterior probability or the posterior odds may not reflect what we really want to quantify. Suppose that there is a null hypothesis H_{0(k, j)} : θ_{(k, j)} = θ_{0} and an alternative hypothesis H_{1(k, j)} : θ_{(k, j)} ≠ θ_{0}. For posterior odds Pr(H_{1(k, j)} D)/ Pr(H_{0(k, j)} D), it is always going to get an infinity result which cannot used for decisionmaking. However, in terms of Bayes factor, the Pr(H_{1} D)/ Pr(H_{1}) is approximately equal to 1. So we can get the Bayes factor by calculating Pr(H_{0})/ Pr(H_{0} D), i.e., the ratio of the prior density and the posterior density. Therefore, Bayes factors can increase the flexibility in formulating relevant hypotheses. On the other hand, although the the posterior probability or the posterior odds value can represent the strength of evidence in favour of the alternative hypothesis, the Bayes factor is a more commonly used Bayesian solution to the hypothesis testing problems.
Extension to adaptive enrichment design
Along the way of subgroup identification, IBIS can be extended to the following multistage adaptive enrichment design. Suppose there are a total of I analyses, including I − 1 interim analyses and one final analysis. Patients with any biomarker status can be enrolled in the initial stage of the trial. In the i_{th}(i = 1, …, I − 1) interim analysis, a go/nogo decision is made based on the Bayes factors, specifically as follows:

(1)
If \({BF}_{\left(k,j\right)}>{BF}_{E\left(k,j\right)}^{(i)}\), the investigational therapy is considered to be effective for subgroup (k, j);

(2)
If \({BF}_{\left(k,j\right)}\le {BF}_{P\left(k,j\right)}^{(i)}\), the investigational therapy is considered to be ineffective for subgroup (k, j);

(3)
If BF_{(k, j)} is between these two thresholds, the investigational therapy is considered promising for subgroup (k, j).
Thresholds \({BF}_{E\left(k,j\right)}^{(i)}\) and \({BF}_{P\left(k,j\right)}^{(i)}\) are important design parameters that need to be calibrated carefully, and the calibration strategy will be covered later. After the interim analysis, only promising subgroups will be enrolled in the next stage. The enrollment of the other two kinds of subgroups will be stopped early for efficacy or futility. The whole trial will be stopped early if there are no promising subgroups. Note that we can define \({BF}_{E\left(k,j\right)}^{(i)}=\infty \left(i=1,\dots, I1\right)\) and thus prevent early stopping for efficacy. At the final analysis, if \({BF}_{\left(k,j\right)}>{BF}_{E\left(k,j\right)}^{(I)}\), the investigational therapy is considered to be effective for subgroup (k, j); otherwise, the investigational therapy is considered to be ineffective for subgroup (k, j). It is logistically and operationally intractable to change enrollment criteria too often in one clinical trial and a twostage design is often recommended for the adaptive enrichment design. Figure 2 shows the schema of a twostage adaptive enrichment design based on IBIS.
We also considered the monotonic relationship between drug efficacy and biomarker levels here. Specifically, once the investigational therapy is considered promising for subgroup \(\left(\overset{\sim }{k},\overset{\sim }{j}\right)\), it is also deemed promising for subgroups which were initially judged ineffective in subset \(C:\left\{k\ge \overset{\sim }{k},j\ge \overset{\sim }{j}\right\}\); once the investigational therapy is considered effective for subgroup \(\left(\overset{\sim }{k},\overset{\sim }{j}\right)\), it is also deemed effective for subgroups which were initially judged ineffective or promising in subset \(C:\left\{k\ge \overset{\sim }{k},j\ge \overset{\sim }{j}\right\}\).
Since the proposed adaptive enrichment design incorporates the strategy of early stopping for futility and efficacy, it may face challenges including type I error inflation and reduced statistical power. Therefore, it is critical to determine reasonable design parameters in the planning stage. Usually, two key metrics, familywise error rate (FWER) and conjunctive power [29], are the main concern; these metrics are defined as follows:
The general strategy of calibrating design parameters is to make the design achieve satisfactory FWER, conjunctive power and expected sample size in several typical scenarios by simulation.
Simulation study
Evaluation of the estimation performance
We conduct computer simulations to evaluate the estimation performance of IBIS. Consider a onestage trial where the primary efficacy endpoint is the ratio of tumor size at 1 month after treatment to that at baseline. After transformation to the log scale, this ratio is assumed to be a normally distributed continuous variable. The smaller the log of the ratio, the greater the benefit to patients. To be consistent with the hypothesis testing framework described before, we add a minus sign to the transformed endpoint. Using statistical notation, if Y_{(k, j)} denotes the ratio of tumor size for subgroup (k, j), then \(\log \left({Y}_{\left(k,j\right)}\right)\sim N\left({\theta}_{\left(k,j\right)},{\sigma}_{\left(k,j\right)}^2\right)\). There are three and four levels for Biomarker1 and Biomarker2, respectively, so the entire population can be divided into 12 subgroups. The minimum acceptable treatment effect θ_{0} = 0 and a clinically meaningful treatment effect is equal to 1. A total of eight scenarios are incorporated in the simulation, including the global null (scenario 1), global alternative (scenario 2), good nugget (scenario 3), bad nugget (scenario 4), mostly null (scenario 5), mostly alternative (scenario 6), half alternative (scenario 7) and linear (scenario 8) scenarios. The detailed treatment effect θ_{(k, j)} for each subgroup can be found in Table 2. It is assumed that all \({\sigma}_{\left(k,j\right)}^2\) equal 1, and the sample size of each subgroup is 10. The priors are set as follows: θ_{g}~N(1, 10^{3}), θ_{∁g}~N(0, 10^{3}), \({\sigma}_g^2\sim IG\left({10}^{3},{10}^{3}\right)\) and \({\sigma}_{\complement g}^2\sim IG\left({10}^{3},{10}^{3}\right)\).
The metrics for evaluating the estimation performance include the mean squared error (MSE), bias and average width of the 95% equaltailed credible interval for the posterior distribution of θ_{(k, j)}. The MSE is defined as the average squared difference between the estimated values and the actual value of θ_{(k, j)}. The bias is defined as the expected difference between the estimated values and the actual value of θ_{(k, j)}. For simplification, we omit the subscript (k, j) when it does not cause ambiguity:
where \(\hat{\theta}\) and θ^{∗} denote the estimated value and the actual value of the treatment effect, respectively. \({\hat{\theta}}_i\) denotes the estimated value of the treatment effect for the i_{th} simulated trial, where we use the posterior mean as the estimate. The total number of simulated trials, denoted as n_{sim}, is 10,000 here. The average width of the 95% equaltailed credible interval for the posterior distribution is defined as follows:
where l(q, θ) = {l : Pr(θ ≤ l) = q}, which denotes a quantile function. θ_{i} represents the posterior distribution of the treatment effect for the i_{th} simulated trial.
Estimation methods used for comparison include independent analysis and BHM. In the independent analysis, the parameter estimate is the sample mean \(\overline{Y}\). Analogous to the 95% credible interval described above, the precision of the estimate is measured by the width of the 95% confidence interval under the frequentist statistics, i.e., \(\overline{Y}\pm {t}_{1\alpha /2}\left(n1\right)\times \frac{s}{\sqrt{n}}\), where t_{1 − α/2}(n − 1) is the 1 − α/2 quantile of the t distribution with degrees of freedom equal to n − 1 (α = 0.05 here). The sample standard deviation \(s=\sqrt{\frac{1}{n1}\sum_{i=1}^n{\left({Y}_i\overline{Y}\right)}^2}\) and n is the sample size of the subgroup under evaluation. In BHM, all θ_{(k, j)} are treated as exchangeable, i.e., borrowing information is conducted across all subgroups. We force \({\theta}_{\left(k,j\right)}\sim N\left(\overset{\sim }{\theta },{\overset{\sim }{\sigma}}^2\right)\) with the priors of \(\overset{\sim }{\theta }\) and \({\overset{\sim }{\sigma}}^2\) equal to N(0, 10^{3}) and IG(10^{−3}, 10^{−3}), respectively.
Evaluation of the operating characteristics
The operating characteristics of IBIS on subgroup identification are also evaluated by computer simulation. The simulation settings are the same as those of the above evaluation of the estimation performance. The evaluation metrics include FWER and conjunctive power. The methods used for comparison include BHM, independent analysis and another frequentist subgroup identification method, denoted as ‘Freq’ here. Suppose there is only one subgroup, then with a null hypothesis H_{0} : θ ≤ 0, an expected treatment effect to be 1 and a standard deviation to be 1, a study with 10 participants has approximately 90% power to reject the null hypothesis using ttest at the 5% significance level. Therefore, it can be expected that if the FWER is controlled when there are 12 subgroups (each with 10 participants), then the conjunctive power will decrease a lot, i.e., resulting in poor performance on subgroup identification. In this case, we presume that using IBIS to borrow information across subgroups may improve the accuracy of subgroup identification.
The decisionmaking process of BHM is the same as that of IBIS. With the vague prior distributions we set for parameters in IBIS and BHM method, there is no preference for the null hypothesis or the alternative hypothesis with regard to any one subgroup, i.e., Pr(H_{0(k, j)}) = Pr(H_{1(k, j)}) = 0.5, in the simulation of BHM and IBIS. The decisionmaking process of independent analysis is nearly the same as that of the IBIS, and the only change is to replace the Bayes factor with a decisionmaking based on a ttest. If the following inequation
is satisfied, then the investigational combination therapy is considered effective for subgroup (k, j). n_{(k, j)}, \({\overline{Y}}_{\left(k,j\right)}\) and s_{(k, j)} denote the sample size, sample mean and sample standard deviation of subgroup (k, j), respectively.
In the Freq method, decisionmaking is based on Lai et al. [24], who first divide the subgroups into two subsets and then make inferences for the subsets separately. Different from using Jensen–Shannon divergence, the highefficacy subset is determined by selecting the largest ttest statistic, which is defined as follows:
where \({t}_{\Pi_g}\) is the ttest statistic for subset Π_{g}. Then, t tests are performed for C_{H} and its complementary set, respectively. If the test statistic is greater than a prespecified threshold, the investigational combination therapy is judged to be effective for that subset; otherwise, it will be considered ineffective.
Let the test thresholds for each subgroup in the same method be equal to facilitate evaluation, although adjustments can be made in practice considering the anticipated efficacy and prevalence of each subgroup. In the Freq method, the test thresholds for highefficacy and lowefficacy subsets are equal. To make the four methods comparable, we calibrate the thresholds to enable their FWER in the global null scenario (scenario 1) to be controlled within 0.1 or 0.05. Specifically, we performed a series of simulation studies for each method under the null scenario over a grid of the thresholds (i.e., the thresholds of Bayes factors for IBIS and BHM; the thresholds of ttest statistics for independent analysis and the Freq method). Then, for each method, the minimum threshold with simulated FWER less than or equal to 0.1 (or 0.05) would be determined as the design parameter.
Regarding the adaptive enrichment design, the parameter calibration is much more complicated. A general strategy is to make the design achieve an acceptable FWER, conjunctive power and expected sample size in some typical scenarios by simulation with a limited maximum sample size. This may be computationally expensive, which is the sacrifice for adaptability and flexibility.
Consider a twostage trial with a sample size of five for each subgroup in the first stage. If the investigational therapy is promising for a subgroup, five more patients will be enrolled in the second stage. To preliminarily conduct the parameter calibration, let the decision thresholds for each subgroup be the same, i.e., \({BF}_{E\left(k,j\right)}^{(1)}={BF}_{E\left(k,j\right)}^{(2)}={BF}_E\) and \({BF}_{P\left(k,j\right)}^{(1)}={BF}_P\). Specifying loose decision thresholds may increase power for highefficacy subgroups, but may also inflate type I errors for lowefficacy subgroups. Therefore, we define a decision score function to comprehensively measure FWER, conjunctive power and expected sample size:
where EN denotes the expected sample size and N_{max} denotes the maximum sample size, which is equal to 120 here. The above function shows that the loss of increasing one unit FWER can offset the benefit of increasing β_{1} unit conjunctive power; the loss of increasing one unit EN can offset the benefit of increasing β_{2}/N_{max} unit conjunctive power. For example, setting β_{1} = 1 and β_{2} = 0.5 means that a 1% increase of the FWER is enough to offset a 1% increase of power. At the same time, an increase of one expected sample size could offset either a 0.4% (i.e., β_{2}/N_{max}) increase of power or a 0.4% decrease of the FWER. Such a decision score function can be interpreted as the tradeoff between FWER, conjunctive power and expected sample size. If a large value is set for β_{1}, the design will favor a stricter control for FWER. The larger the β_{2}, the more inclined we are to control the cost of the current trial by reducing the expected sample size. How to choose β_{1} and β_{2} is a key and difficult problem, and is mainly determined by the potential losses caused by type I and type II errors. The type I error results in the loss of future clinical research with ineffective investigational therapeutics, while the type II error indicates the unavailability of effective treatments for some patients and the loss of marketing revenues. How do sponsors view these potential losses and the cost of the current trial will play a decisive role in determining β_{1} and β_{2}. With β_{1} = 1 and β_{2} = 0.5, scenario 8 is taken as an example here to explain how to obtain design parameters for such an adaptive enrichment design.
Results
Accuracy and precision
The simulation results of the estimation performance are shown in Figs. 3, 4 and 5 and could reflect the accuracy and precision of IBIS. It can be seen that IBIS has the lowest MSE overall, with only few scenarios where the MSEs for few subgroups are slightly higher than that of independent analysis (Fig. 3). Specifically, in subgroup (3,4) of scenario 3 (the good nugget), the MSE of IBIS is relatively high. This is because the treatment effect for this subgroup is so different from other subgroups, and IBIS tends to analyze it alone or just combine it with few adjacent subgroups, resulting in a relatively large variance of the estimate, which can be demonstrated by its wide 95% CI as well (Fig. 5). In this nugget scenario, BHM inevitably produces a large estimation bias (Fig. 4), which in turn leads to a much higher MSE. Similar results also arise in subgroup (1,1) of scenario 4. The estimate of the independent analysis for each subgroup is unbiased (Fig. 4), so the MSE is consistent with the variance of the sample mean, i.e., standard error, which is a constant of 0.1 (Fig. 3). The MSE of BHM is the smallest in homogeneous scenarios (scenarios 1 and 2), but BHM is not robust in heterogeneous scenarios, especially in scenarios 3 and 4, where the MSEs for nugget subgroups exceed 0.4. This is already more than four times that of the other two methods. The bias of BHM is also large in heterogeneous scenarios, and the absolute values of the bias for some subgroups are higher than 0.2 (Fig. 4), while the IBIS is much more robust. Notably, although BHM borrows information across all subgroups, the 95% CIs for its estimates are generally wider in heterogeneous scenarios than that of IBIS. This is because the variance across subgroups \({\overset{\sim }{\sigma}}^2\) estimated by BHM is large due to heterogeneous subgroups, resulting in limited information borrowing. In contrast, the subgroup division in IBIS makes the similarity within each subset higher and thus leads to more information borrowing.
In summary, from the three metrics (MSE, bias and 95% CI width), the estimation performance of the IBIS is superior and robust under the prespecified scenarios. The proposed method is especially suitable for scenarios where subgroups should be divided into two subsets.
Operating characteristics of subgroup identification
The simulation results of the operating characteristics are presented in Table 3. In the global alternative scenario (scenario 2), BHM can identify all subgroups correctly with a 100% probability, indicating a strong borrowing strength. The Freq method achieves nearly 90% conjunctive power (hereinafter referred to as power), which reflects the advantage of partial combination. The power produced by IBIS is slightly lower than that of the Freq method but is much higher than that of the independent analysis.
When the subgroups are heterogeneous (scenarios 3–8), the IBIS outperforms the other three methods, evidenced by a better balance between FWER and power. Compared with independent analysis, the power of the IBIS is much higher. When the investigational therapy is effective for several subgroups (scenarios 4, 6, 7, and 8), it is difficult for independent analysis to identify all alternative subgroups correctly at the same time. The FWER of IBIS is slightly higher than that of independent analysis in most scenarios, except in scenario 3, where information borrowing across lowefficacy subgroups reduces the type I error rates. However, such type I error inflation is relatively limited, for example, approximately 10–20% in scenarios 5–7. At this time, the number of null subgroups is small, and the borrowing strength between subgroups is not as strong as that in scenarios 1–4. Therefore, there are some misjudgments for null subgroups. There is a consensus that controlling type I error rates is critical in clinical trials. But it should also be recognized that if the investigational drug is effective for a large portion of the population (e.g., scenario 5), it may be inappropriate to borrow no information at all. In scenarios 5–7, although FWER inflates, the proposed IBIS design misjudge only one or two null subgroups in most cases (see section A of the Additional file 1 for the the detailed simulated probabilities of misjudging null and alternative subgroups in each scenario). Taking scenario 5 in simulation 1 as an example, there are a total of 8 subgroups with low efficacy to treatment, and the familywise type I error rate is 15.86%, higher than 10%. But most of the error cases are misjudging one (9.52%) or two subgroups (4.54%). In these cases, the investigational drug combination are effective in 2/3 of the identified subgroups and are ineffective in 1/3 of the identified subgroups. Therefore, given that some marketed drugs are not effective for all patients, this kind of type I error inflation is acceptable to some extent. We believe that strictly controlling the type I error of one or two subgroups at the expense of power gains of most subgroups is not desirable for subgroup identification in an exploratory trial.
The power produced by BHM is high in most scenarios, except in scenario 3, where there is only one good nugget subgroup. However, BHM also tends to overestimate the efficacy of investigational therapy for lowefficacy subgroups, resulting in unacceptable FWERs.
The operating characteristics of the Freq method are relatively robust compared with independent analysis and BHM, but are inferior to the proposed IBIS in general. This is due to its strategy of partial combination. For example, if Freq mistakenly classifies an alternative subgroup into a lowefficacy subset, it is likely to accept the null hypothesis in subsequent inferences and conclude that the investigational therapy is ineffective for all subgroups in the lowefficacy subset. In contrast, the IBIS method, although also making subgroup divisions, only borrows information within each subset, and the inferences are still based on each subgroup. Therefore, the probabilities of making such incorrect decisions are smaller in most scenarios.
Additional sensitivity analyses evaluating the prior settings for the shrinkage parameters are performed and the results are shown in section B of the Additional file 1. The results show that the used inversegamma priors are robust in our simulation study. But it should be noted that there may be no optimal choice for the prior distributions. The priors we used here may be not applicable in other studies. The appropriate prior distributions for a particular trial must be determined by the cooperation of clinicians and statisticians at the time of trial design. See section B of the Additional file 1 for more details and remarks.
An example of how to obtain design parameters for a multistage adaptive enrichment design
Regarding the adaptive enrichment design, we evaluate the performance of IBIS under different values of BF_{E} and BF_{P} for scenario 8 and draw corresponding heatmaps for FWER, conjunctive power, expected sample size and decision score (Fig. 6). The corresponding heatmaps for other scenarios are shown in section C of the Additional file 1. With the help of such heatmaps, decisionmakers can more intuitively observe the changes of the above metrics along with the decision thresholds and thus make tradeoffs between these metrics.
It should be noted that the restriction \({BF}_{E\left(k,j\right)}^{(1)}={BF}_{E\left(k,j\right)}^{(2)}={BF}_E\) and \({BF}_{P\left(k,j\right)}^{(1)}={BF}_P\) is set mainly because we want to first have a general overview of the operating characteristics with different combinations of decision thresholds. After determining the range of decision thresholds that meet the requirements (in the example above, when BF_{P} is from 5.0 to 12.5 and BF_{E} is from 75 to 150, the decision score is relatively high), we can further adjust such thresholds according to the anticipated efficacy and prevalence of each subgroup.
In addition to FWER, conjunctive power and expected sample size, some other metrics can also be considered, such as the disjunctive power, which is defined as follows:
This metric makes sense, especially when the sample size is limited. On the other hand, the operating characteristics of the design in other scenarios should be also taken into account (see section C of the Additional file 1). In conclusion, the composition of the decision score function and the weights of different metrics need to be discussed sufficiently by researchers, sponsors and biostatisticians. We should set up reasonable scenarios based on existing medical research data casebycase and take into account the requirements of various stakeholders when calibrating parameters of multistage adaptive enrichment designs.
Discussion
Compared with monotherapy, targeted combination therapy leads to an increasing number of subgroups and complicated biomarkerbased efficacy profiles. Therefore, we propose a statistical tool called IBIS in this article and incorporate it into the adaptive enrichment design framework. The IBIS contains three main elements: subgroup division, efficacy evaluation and subgroup identification. We first enumerate all possible subgroup divisions based on biomarker levels. Then, Jensen–Shannon divergence is used to distinguish highefficacy and lowefficacy subgroups, and BHM is employed to borrow information within these two subsets for efficacy evaluation. Regarding subgroup identification, a hypothesis testing framework based on Bayes factors, which also plays a key role in go/nogo decisions and enriching specific subgroups in subsequent stage of trial, is constructed. Simulation studies show that, compared with some traditional methods, our proposed IBIS has superior and robust operating characteristics, and has the potential to be a useful tool for subgroup identification and population enrichment in clinical trials of targeted combination therapy.
As the combination of two targeted agents is a promising therapeutic approach at present, we take trial designs that consider two ordinal biomarkers as examples in this paper. The proposed design can be naturally extended for subgroup identification where multiple biomarkers are incorporated. We can still enumerate all possible subgroup divisions, find the optimal division, and then use the BHM to borrow information within each subset. In terms of continuous biomarkers, if we cannot predefine subgroups and the study objective is to find arbitrary biomarker cutoffs, the proposed design cannot work. If the subgroups can be predefined, e.g., be divided by quartiles, then the proposed design can still be applied.
Another feasible monitoring method for interim analysis is to calculate a predictive probability that BF_{(k, j)} exceed a success threshold at the end of the trial. Each time an interim analysis is performed, this predictive probability is compared with prespecified cutoffs to decide the target population in the subsequent stage. Some researches, e.g., Lee and Liu (2008) [30], found that comparable operating characteristics can be obtained by taking both the predictive probability and the posterior probability approaches. This conclusion should also be generalized to our proposed design as the predictive probability approach does not include more patients’ data compared with the posterior probability approach. One advantage of the predictive probability approach is that it can give the probability of finally obtaining a positive result at interim analysis. However, it also requires more computational expense when predicting future data.
Some points need to be paid attention to when applying IBIS. For example, when the number of subgroups is small, it may be inappropriate to directly use the inverse gamma distribution with small shape and scale parameters as a prior for the variances across subgroups [31]. Some weakly informative priors can be applied instead. The prevalence of each subgroup also influences trial design and conduction. Enrollment of patients may be difficult when the prevalence of certain subgroups is low, and we cannot accumulate enough evidence to demonstrate how well the investigational therapy works on those subgroups. On the other hand, the treatment parameters of subgroups with smaller sample sizes tend to have stronger shrinkage in BHM [32], potentially causing misleading interpretations of trial results. When faced with such a problem, our suggestion is to first consider whether the anticipated efficacy for the rare subgroups is close to that of the adjacent subgroups based on clinical experience. If they are close enough, pooled analysis is suggested; if it is uncertain whether they are close, an independent exploratory analysis for the rare subgroups is recommended rather than a BHMbased analysis.
Regarding the choice between singlearm trial and RCT, we must admit that RCT is a better choice when cost, ethics and some other aspects allow, due to its advantages of eliminating selection bias and minimizing confoundings. However, RCT also has the limitations of high cost, long period and difficult implementation, which may not be appropriate if such a subgroup identification or enrichment trial is positioned as an exploratory trial. One strategy that deserves further exploration is the seamless transition design from openlabel singlearm to randomized doublearm clinical trials.
There are also some other areas where this research can be extended. For example, the sample size reestimation based on the Bayes factors can be considered in the interim analysis. Historical information on monotherapy can be incorporated to determine the priors in Bayes factors. There is also a class of adaptive enrichment designs that take into account situations where subgroups are not predefined [33,34,35,36,37,38,39,40,41,42]. Therefore, in addition to multilevel biomarkers, we can further consider how to determine the cutoffs of continuous biomarkers, or include other important covariates in the model, such as age, Eastern Cooperative Oncology Group Performance Status (ECOG PS), key laboratory testing results, etc., which is more in line with the concept of precision medicine.
Conclusions
IBIS has superior and robust operating characteristics in terms of subgroup identification and population enrichment. It has the potential to be a useful tool for biomarkerbased subgroup identification in clinical trials of targeted antitumor combination therapy.
Availability of data and materials
The R code for simulation during the current study are available in the github repository, https://github.com/cccc633/IBIS.
Abbreviations
 BHM:

Bayesian hierarchical model
 ADC:

antibodydrug conjugate
 RCT:

randomized controlled trial
 FWER:

familywise error rate
 MSE:

mean squared error
 ECOG PS:

Eastern Cooperative Oncology Group Performance Status
References
Cheng AL, Qin S, Ikeda M, Galle PR, Ducreux M, Kim TY, et al. Updated efficacy and safety data from IMbrave150: Atezolizumab plus bevacizumab vs. sorafenib for unresectable hepatocellular carcinoma. J Hepatol. 2022;76(4):862–73.
Hellmann MD, PazAres L, Bernabe Caro R, Zurawski B, Kim SW, Carcereny Costa E, et al. Nivolumab plus Ipilimumab in advanced nonsmallcell lung Cancer. N Engl J Med. 2019;381(21):2020–31.
Loi S, GiobbieHurder A, Gombos A, Bachelot T, Hui R, Curigliano G, et al. Pembrolizumab plus trastuzumab in trastuzumabresistant, advanced, HER2positive breast cancer (PANACEA): a singlearm, multicentre, phase 1b2 trial. Lancet Oncol. 2019;20(3):371–82.
Oldenhuis CN, Oosting SF, Gietema JA, de Vries EG. Prognostic versus predictive value of biomarkers in oncology. Eur J Cancer. 2008;44(7):946–53.
Borghaei H, Besse B, Bardia A, Mazieres J, Popat S, Augustine B, et al. Trastuzumab deruxtecan (TDXd; DS8201) in combination with pembrolizumab in patients with advanced/metastatic breast or nonsmall cell lung cancer (NSCLC): a phase Ib, multicenter, study. J Clin Oncol 2020;38(15):suppl.TPS1100.
Bayat Mokhtari R, Homayouni TS, Baluch N, Morgatskaya E, Kumar S, Das B, et al. Combination therapy in combating cancer. Oncotarget. 2017;8(23):38022–43.
Choong GM, Cullen GD, O'Sullivan CC. Evolving standards of care and new challenges in the management of HER2positive breast cancer. CA Cancer J Clin. 2020;70(5):355–74.
Denkert C, Seither F, Schneeweiss A, Link T, Blohmer JU, Just M, et al. Clinical and molecular characteristics of HER2lowpositive breast cancer: pooled analysis of individual patient data from four prospective, neoadjuvant clinical trials. Lancet Oncol. 2021;22(8):1151–61.
Takegawa N, Tsurutani J, Kawakami H, Yonesaka K, Kato R, Haratani K, et al. [fam] trastuzumab deruxtecan, antitumor activity is dependent on HER2 expression level rather than on HER2 amplification. Int J Cancer. 2019;145(12):3414–24.
Modi S, Park H, Murthy RK, Iwata H, Tamura K, Tsurutani J, et al. Antitumor activity and safety of Trastuzumab Deruxtecan in patients with HER2lowexpressing advanced breast Cancer: results from a phase Ib study. J Clin Oncol. 2020;38(17):1887–96.
Brannath W, Zuber E, Branson M, Bretz F, Gallo P, Posch M, et al. Confirmatory adaptive designs with Bayesian decision tools for a targeted therapy in oncology. Stat Med. 2009;28(10):1445–63.
Gotte H, Donica M, Mordenti G. Improving probabilities of correct interim decision in population enrichment designs. J Biopharm Stat. 2015;25(5):1020–38.
Jenkins M, Stone A, Jennison C. An adaptive seamless phase II/III design for oncology trials with subpopulation selection using correlated survival endpoints. Pharm Stat. 2011;10(4):347–56.
Liu A, Liu C, Li Q, Yu KF, Yuan VW. A threshold sampleenrichment approach in a clinical trial with heterogeneous subpopulations. Clin Trials. 2010;7(5):537–45.
Mehta C, Schafer H, Daniel H, Irle S. Biomarker driven population enrichment for adaptive oncology trials with time to event endpoints. Stat Med. 2014;33(26):4515–31.
Rosenblum M, Luber B, Thompson RE, Hanley D. Group sequential designs with prospectively planned rules for subpopulation enrichment. Stat Med. 2016;35(21):3776–91.
Rosenblum M, van der Laan MJ. Optimizing randomized trial designs to distinguish which subpopulations benefit from treatment. Biometrika. 2011;98(4):845–60.
Sinha AK, Moye L III, Piller LB, Yamal JM, Barcenas CH, Lin J, et al. Adaptive groupsequential design with population enrichment in phase 3 randomized controlled trials with two binary coprimary endpoints. Stat Med. 2019;38(21):3985–96.
Uozumi R, Hamada C. Interim decisionmaking strategies in adaptive designs for population selection using timetoevent endpoints. J Biopharm Stat. 2017;27(1):84–100.
Wang SJ, O'Neill RT, Hung HM. Approaches to evaluation of treatment effect in randomized clinical trials with genomic subset. Pharm Stat. 2007;6(3):227–44.
Wu LW, Li Q, Liu MY, Lin JC. Incorporating surrogate information for adaptive subgroup enrichment design with sample size reestimation. Stat Biopharm Res. 2022;14(4):493–504.
Chiu YD, Koenig F, Posch M, Jaki T. Design and estimation in clinical trials with subpopulation selection. Stat Med. 2018;37(29):4335–52.
Lai TL, Lavori PW, Liao OY. Adaptive choice of patient subgroup for comparing two treatments. Contemp Clin Trials. 2014;39(2):191–200.
Lai TL, Lavori PW, Tsang KW. Adaptive enrichment designs for confirmatory trials. Stat Med. 2019;38(4):613–24.
Magnusson BP, Turnbull BW. Group sequential enrichment design incorporating subgroup selection. Stat Med. 2013;32(16):2695–714.
Wang SJ, Hung HM, O'Neill RT. Adaptive patient enrichment designs in therapeutic trials. Biom J. 2009;51(2):358–74.
Fuglede B, Topsoe F. JensenShannon divergence and Hilbert space embedding. International Symposium onInformation Theory, 2004. 2004;ISIT 2004. Proceedings:31.
Kailath T. The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans Commun Technol. 1967;15(1):52–60.
Vickerstaff V, Omar RZ, Ambler G. Methods to adjust for multiple comparisons in the analysis and sample size calculation of randomised controlled trials with multiple primary outcomes. BMC Med Res Methodol. 2019;19(1):129.
Lee JJ, Liu DD. A predictive probability design for phase II cancer clinical trials. Clin Trials. 2008;5(2):93–106.
Gelman A. Prior distributions for variance parameters in hierarchical models (comment on an article by Browne and Draper). Bayesian Anal. 2006;1(3):515–33.
Berry SM, Broglio KR, Groshen S, Berry DA. Bayesian hierarchical modeling of patient subpopulations: efficient designs of phase II oncology clinical trials. Clin Trials. 2013;10(5):720–34.
Diao G, Dong J, Zeng D, Ke C, Rong A, Ibrahim JG. Biomarker threshold adaptive designs for survival endpoints. J Biopharm Stat. 2018;28(6):1038–54.
Hui J, Guo W. Optimal biomarker cutoff identification and validation. Stat Biosci. 2022;14:352–62.
Johnston SE, Lipkovich I, Dmitrienko A, Zhao YD. A twostage adaptive clinical trial design with datadriven subgroup identification at interim analysis. Pharm Stat. 2022;21(5):1090–108.
Li J, Zhao L, Tian L, Cai T, Claggett B, Callegaro A, et al. A predictive enrichment procedure to identify potential responders to a new therapy for randomized, comparative controlled clinical studies. Biometrics. 2016;72(3):877–87.
Park Y, Liu S. A randomized group sequential enrichment design for immunotherapy and targeted therapy. Contemp Clin Trials. 2022;116:106742.
Simon N, Simon R. Adaptive enrichment designs for clinical trials. Biostatistics. 2013;14(4):613–25.
Simon N, Simon R. Using Bayesian modeling in frequentist adaptive enrichment designs. Biostatistics. 2018;19(1):27–41.
Spencer AV, Harbron C, Mander A, Wason J, Peers I. An adaptive design for updating the threshold value of a continuous biomarker. Stat Med. 2016;35(27):4909–23.
Xu Y, Constantine F, Yuan Y, Pritchett YL. ASIED: a Bayesian adaptive subgroupidentification enrichment design. J Biopharm Stat. 2020;30(4):623–38.
Zhang Z, Chen R, Soon G, Zhang H. Treatment evaluation for a datadriven subgroup in adaptive enrichment designs of clinical trials. Stat Med. 2018;37(1):1–11.
Acknowledgements
We would like to thank AJE (www.aje.cn) for English language editing.
Funding
This work was supported by the National Natural Science Foundation of China [No.81973145, No. 82273735] and Key R&D Program of Jiangsu Province (Social Development) (BE2020694).
Author information
Authors and Affiliations
Contributions
XC, JZ and FY contributed to the conception and design of the work. XC finished the code. XC, LJ and FY interpreted the results. XC and FY drafted the manuscript, and all authors read, edited, and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1.
Supplemental simulation results.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Chen, X., Zhang, J., Jiang, L. et al. IBIS: identify biomarkerbased subgroups with a Bayesian enrichment design for targeted combination therapy. BMC Med Res Methodol 23, 66 (2023). https://doi.org/10.1186/s1287402301877w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1287402301877w
Keywords
 Biomarker
 Subgroup identification
 Adaptive enrichment design
 Combination therapy
 Bayesian hierarchical model (BHM)
 Twostage design