Overall evaluation
A total of 29 articles were included into the review of Phase 2 two-stage trials in glioblastoma. Among 29 reviewed articles, majority study types were glioblastoma (n = 20, 69% over high-grade glioma, n = 9, 31%) with recurrent patients (n = 23, 79% over newly diagnosed patients, n = 6 and 21%) and adult patients (n = 22, 76% over pediatric population, n = 7, 24%). Table 1 is the summary of the included studies [18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46]. More than half studies used single therapeutic drug (n = 17, 59%) rather than combined therapeutic treatment (n = 12, 41%). 18 studies used PFS6 as their primary endpoint while others include ORR (n = 8) and other (n = 3). Except for three clinical trials that didn’t provide the methods used, almost all articles were Simon’s two-stage designs (n = 23, 90%). The other three trials used two-stage designs like Inadmissible design, Fleming and Gehan designs. Study design input information and output results from sample size calculation related to two-stage design implementation were examined. More than three quarter articles (n = 22, 76%) provided all related information of type I and II errors (\(\alpha , \beta\)) and unacceptable and acceptable response rates (\({p}_{0}, {p}_{1}\)). But interestingly almost 60% of studies (17/29, 59%) failed to provide at least one key output results of sample size calculation such as the number of samples of first stage and both stages (\({n}_{1}, n\)) and the treatment rejection numbers of the first stage and both stages (\({r}_{1}, r\)). Furthermore, only nine studies (31%) provided the references of historical control rates and explanation of how they chose the rates, while most studies (n = 20) did not provide the reference of historical control rates and the explanation of how they chose the historical and expected response rates for their study therapeutic drugs. Only three trials (11%) provided key input parameters, appropriately reported output results from sample size calculation of two-stage designs, and finally provided the reference and explanation of historical control rates. Among 29 trials, only three has been completed for both stages and two studies have shown the efficacy. Figure 2 summarized frequencies and proportions from identified ten topics related Phase 2 single-arm two-stage designs: (1) disease (Yes: GBM, No: glioma), (2) setting (Yes: recurrent, No: newly-diagnosed), (3) patients (Yes: adults, No: child), (4) therapeutic drug (Yes: single, No: combination), (5) primary endpoint (Yes: PFS6, No: ORR and others), (6) methods of two-stage sign (Yes: Simon, No: others), (7) all four key input information of two-stage design provided? (Yes, No), (8) all four output results of sample size appropriately reported? (Yes, No), (9) reference of historical control data provided? (Yes, No), (10) all key input and output information as well as reference of historical control rates provided (Yes, No)?, and (11) did the trial be stopped (Yes, No)?
Summary for general study design
Most frequently used population was adult patients with recurrent glioblastoma. Disease population was categorized into three diseases of glioblastoma (n = 20), high-grade glioma (n = 8), and brain metastasis from glioblastoma (n = 1), two settings of recurrent status (n = 23) and newly diagnosed status (n = 6), two patient types of adults (n = 23) and child or pediatric (n = 6), and two therapeutic drug types of single (n = 17) and combination (n = 12). Temozolomide (TMZ) was mostly used for combination therapeutic drugs (n = 7 with pegylated liposomal doxorubicin (PLD), O6-benzylguanine (O6B), irinotecan (IRI), decitabine (DAC), Dendritic (DEN), Nintedanib (NIN), and Atorvastatin (ATO)) while Bevacizumab (BEV) was second mostly used for combination drugs (n = 3 with temsirolimus (TEM), Ponatinib (PON), and Evofosfamide (EVO)). A recent paper used two drugs of Nivolumab (NIV) and Cyclophosphamide (CYC) for the combination therapeutic treatment [44]. A total of 17 drugs were used as single therapeutic treatment with Sunitinib (SNT) and Nintedanib (NIN) from two studies each, and Temozolomide (TMZ), Bendamustine (BEN), Temsirolimus (TMS), Gimatecan (GMT), Bosutinib (BOS), Dasatinib (DAS), Tivozanib (TIV), Imipridone (IMI), Ortatzxel (ORT), Dovitinib (DOV), Perifosine (PRF), Thrombopoietin receptor (THR), and Pomalidomide (POM) from single study each. The most widely used endpoints were PFS6 (n = 18) and ORR (n = 8) in Phase 2 single-arm trials.
Key input information for two-stage design implementation
Among 29 Phase 2 single-arm trials, 23 trials (79%) used Simon’s two-stage designs, three trials used other two-stage designs (Gehan, Fleming and admissible design each), and 3 trials just mentioned two-stage design without specific design information. Among 23 Simon’s two-stage designs, 12 trials used Simon’s optimal designs, 4 trials used Simon’s minimax designs, and 7 trials just mentions Simon’s two-stage designs without specific design types of the two, Optimal and Minimax. It’s interesting to see that most trials without mentioning specific design types (like Simon’s optimal or minimax, Gehan, Fleming, or admissible designs) failed to provide one or more than one key information for the implementation results of two-stage design sample size calculation. The two types of errors (\(\alpha , \beta\)) and unacceptable and acceptable response rates (\({p}_{0}, {p}_{1}\)) are key input information for successful sample size calculation of two-stage design. Most trials (n = 22, 76%) successfully provided all key information while 7 trials (24%) failed to provide at least one key information (Six trials failed to provide two types of error rates, four trials failed to provide two response rates, and three trials failed to provide two or more than two key results from the sample size calculation).
Key output results from two-stage design sample size calculation
Only 12 trials (41%) reported all four key output results from sample size calculation while 17 trials failed to report at least one key information (both parameters for 8 trials and the response number for both stages (\(r\)) for 17 trials). Most studies (n = 27, 93%) provided the number of patients in stage 1 and both stages, so many trials (n = 17) failed to report one or more from both response numbers of stage 1 and both stages which are key information to determine the study continuation toward the second stage (\({r}_{1}\)) at the end of first stage and hypothesis testing of efficacy (\(r\)) at the end of second stage. Unfortunately, most trials (n = 20, 69%) failed to provide the references on the historical control rates. Furthermore, all trials except one trial did not explain how they chose the acceptable response rate. Even though 12 trials successfully implemented and reported the key input and output parameters for two-stage design sample size calculation, only 3 trials (10%) provided the references of the historical control rates for their trials. Regardless that more than 75% trials mentioned all key input parameters, many studies (17/29, 59%) failed to provide at least one key output of sample size calculation results of the number of samples of both stages (\({n}_{1}, n\)) and the treatment rejection numbers of the first stage and both stages (\({r}_{1}, r\)). In addition, the several trials provided wrong results from sample size calculation even if they reported all related key information for two-stage design implementation (not shown in table). Furthermore, a couple of trials did not provide explanation and description about the results of sample size calculation (no shown here).