Two-stage Bayesian hierarchical modeling for blinded and unblinded safety monitoring in randomized clinical trials

Background Monitoring and reporting of drug safety during a clinical trial is essential to its success. More recent attention to drug safety has encouraged statistical methods development for monitoring and detecting potential safety signals. This paper investigates the potential impact of the process of the blinded investigator identifying a potential safety signal, which should be further investigated by the Data and Safety Monitoring Board with an unblinded safety data analysis. Methods In this paper, two-stage Bayesian hierarchical models are proposed for safety signal detection following a pre-specified set of interim analyses that are applied to efficacy. At stage 1, a hierarchical blinded model uses blinded safety data to detect a potential safety signal and at stage 2, a hierarchical logistic model is applied to confirm the signal with unblinded safety data. Results Any interim safety monitoring analysis is usually scheduled via negotiation between the trial sponsor and the Data and Safety Monitoring Board. The proposed safety monitoring process starts once 53 subjects have been enrolled into an eight-arm phase II clinical trial for the first interim analysis. Operating characteristics describing the performance of this proposed workflow are investigated using simulations based on the different scenarios. Conclusions The two-stage Bayesian safety procedure in this paper provides a statistical view to monitor safety during the clinical trials. The proposed two-stage monitoring model has an excellent accuracy of detecting and flagging a potential safety signal at stage 1, and with the most important feature that further action at stage 2 could confirm the safety issue.


Background
Interest in monitoring and reporting drug safety during the execution of a clinical trial and careful monitoring throughout the development of a drug from pre-clinical to post-marketing stages, has grown at a remarkable rate in the past decade. This attentiveness to drug safety has inspired statistical methods development for monitoring and detecting potential safety signals during trial execution. Proposed methods include Bayesian and frequentist models for blinded and unblinded safety monitoring for randomized clinical trials [1,2].
Blinding is the process of concealing treatment-related information from the people involved in a clinical trial, such as the sponsors, participants, and researchers. Blinding preserves the integrity of the study by minimizing the impact on study findings of conscious or unconscious biases that might result from knowledge of treatment [3,4]. The disclosure of treatment group assignment during the trial is called unblinding. For medical or safety reasons, unblinding a trial is sometimes necessary to protect study participants. The unblinding process is generally pre-specified and detailed in the study protocol [3][4][5].
Data and Safety Monitoring Boards (DSMBs) are independent committees responsible for regular monitoring and reporting of clinical trial safety data [6][7][8]. The Food and Drug Administration (FDA) requires the formation of a DSMB in all trials that assess new interventions [9,10]. Furthermore, the FDA guidance of safety assessment for the Investigational New Drug (IND) safety reporting recommends that unblinding is allowed and needed to identify the important safety information for serious adverse events during an ongoing clinical trial [11]. "Flagging" is the notification process that identifies a potential safety concern in the novel treatment being tested in a clinical trial [3,4]. DSMBs play a critical role in safety flagging; monitoring and reporting both the interests of trial participants, and the scientific integrity of clinical trials [5]. DSMBs regularly review blinded reports and listings of safety data to make determinations on whether the observed risk profile of the drug is different than expected. However, the investigator needs to be blinded to safety analyses throughout the conduction of the clinical trial. This paper investigates the potential impact of the process of the blinded investigator identifying a potential safety signal that the DSMB should further investigate with an unblinded safety data analysis.
The periodic safety reports reviewed by the investigator include a full listing of all adverse events (AE), as well as any serious adverse events (SAE) [12]. The report summarizes a trial's clinical safety endpoints and AEs in terms of frequency of each event, the number of subjects having the event, severity of the event, and relatedness of the event to the study treatment. Because drug-related safety issues might occur at any time during the execution of a clinical trial, interim analyses of blinded safety data could help prevent such safety problems from escalating to significant concerns. Although blinded data analysis is less informative and does not provide a definitive treatment effect estimate, blinded safety data monitoring could identify potential safety issues ahead of scheduled DSMB meetings and prompt decisions regarding an unblinded analysis. For the purpose of accelerating the process of identifying important safety information, one feasible approach is to combine the blinded periodic safety monitoring with the intended unblinded data analysis. Additionally, the monitoring and evaluating of unblinded safety data will be performed based on the safety information from a blinded safety monitoring. Therefore, a two-stage monitoring method could be implemented to confirm and identify a safety signal for unblinded safety data at stage 2 once the potential AE(s) is flagged at stage 1.
Bayesian hierarchical approaches can be used for both blinded and unblinded data analysis by incorporating a prior safety profile of the control group or background rate of events and updating outcomes using accumulating data from the ongoing trial [13][14][15]. Prior assumptions on the safety profile must be made utilizing historical information or epidemiologic data. In this paper, a potential safety signal is identified by calculating the proportion of AEs from the pooled blinded safety data at stage 1. A blinded Bayesian hierarchical model based on Ball's method of identifying possible safety signals is applied to the pooled blinded safety data [15]. Because of the Bayesian paradigm and its associated hierarchical models allow for automated adjustment for multiplicity and could reduce the family-wise error rate (FWER) in both stages, [16] a Bayesian hierarchical model that simultaneously models all AEs is considered [17][18][19]. A randomized clinical trial commonly refers to a control group and one or multiple active treatment groups with different dose levels. Therefore, at stage 2, a Bayesian hierarchical logistic model applied to unblinded data is used to simultaneously confirm whether the flagged safety signals are indeed safety issues [14,15].
Throughout the trial, periodic blinded monitoring of events is conducted using Bayesian methods [16,20]. Typically, investigators review blinded interim safety monitoring reports consisting of the proportion of subjects experiencing each AE with two-sided 95% credible intervals. If a possible safety signal is detected during blinded monitoring, a model-based estimate of the doseresponse relationship on the relative risk will be provided to the DSMB [15].
This paper was originally motivated by the work of Ball and Wen who developed Bayesian objective early stopping rules for screening and monitoring safety in a randomized clinical trial using blinded treatment information [16,20]. However, it is difficult for trial leadership to make decisions about stopping a trial only using blinded data. Therefore, this work proposes contributions to this area which include: 1) a potential Bayesian framework with a two-stage process for safety signal detection that facilitates decision-making using blinded data and then confirms it with the unblinded data; 2) and calculations of operating characteristics for this workflow. Most trials focus on power and sample size calculations (operating characteristics) for the primary efficacy analysis. This work extends this focus to safety by providing false positive and false negative rates for the proposed safety signal identification framework.
Methods: two-stage Bayesian hierarchical models Stage 1: Bayesian blinded safety monitoring The stage 1 Bayesian blinded statistical monitoring method assumes a randomized two-arm or multi-arm clinical trial. Subjects are continuously enrolled into the trial, and the first interim analysis occurs after a total N subjects have been enrolled into I + 1 arms, with n 0 subjects enrolled into the control arm, and n 1 , n 2 , …, n I subjects enrolled into the treatment arms.

Beta-binomial model
According to Wen and Ball's Beta-Binomial model, [16,20] the occurrence of the j th AE among a total J types of AEs is denoted by Y ij for the i th dosage arm. In stage 1, Y j is denoted as the total number of subjects experiencing the j th AE reported in the pooled data, with the observed pooled incidence rate equal toπ j ¼ Y j N . Let π M j represent the prespecified expected pooled incidence rate across all dose levels of the j th AE. The aggregated total across all arms (Y j ) is assumed to have a Binomial distribution with occurrence probability π j and N j ≡ N. That is, Y j ¼ P i Y ij for j = 1, 2, …, J and i = 0, 1, 2, …, I, and the distribution of the j th AE is given by, The occurrence probability π j has a Beta prior distribution to facilitate a conjugate analysis. For example, assuming a Beta(1, 1) prior distribution for π j results in a Beta posterior distribution, The j th AE may have a statistically significant safety signal if the posterior probability of its incidence rate being higher than the pre-specified expected pooled incidence rate exceeds a pre-specified critical value:

Bayesian hierarchical blinded model
Considering the various types of AEs recorded in a clinical trial, multiplicity is a likely issue. Berry and Berry developed a Bayesian hierarchical model to handle multiple AEs simultaneously [17]. For the hierarchical model, it allows for the possible correlation between the AEs through the specified hyperparameters. Additionally, this approach allows for normal hierarchical models on the real line as opposed to the (0, 1) constraint, compared to the Beta-Binomial model. Therefore, the Beta-Binomial model and the Bayesian hierarchical model are combined to form the proposed Bayesian hierarchical blinded model. For the hierarchical model, define π j as a combination of control and treatment incidence rates, given by where Q c is the proportionate sample size of the control arm, Q c ¼ n 0 N ; and Q T ¼ 1 − Q c ¼ P n I N is the proportionate sample size of the treatment arm(s); π Ctr j ; π Trt j are the incidence rates for the j th AE in the control and treatment arms, respectively. Note that the π Trt j does not assume to be the same across treatment arms, it is a pooled incidence rate for j th AE, and Q c is usually fixed across different trial designs including designs that use response adaptive randomization. Assume that the incidence rate for the j th AE in the control arm is equal to the expected pooled incidence rate, π Ctr j ≡ π M j . Then, the π Trt j across the treatment arms could be expressed by the difference between the pooled incidence rate π j and expected pooled incidence rate π M j . Therefore, the logistic transformation is applied, yielding where d j is the log-odds ratio of the probability of a safety event in the treatment relative to control for the j th AE. The incidence rate of an AE is the same for control and treatment arms when d j = 0. Priors are assigned to d j using the following distribution: The hyperparameters for the normal prior distribution of d j have fixed distributions: μ d Nðμ d0 ; σ 2 d0 Þ and σ d~U nif(U a , U b ), where the hyperparameters μ d0 ; σ 2 d0 ; U a ; U b are fixed constants. In general, due to the limited data, the prior information on d j is typically lacking. However, the d j is still identifiable for two reasons. The first is because the randomization allocation to the control arm is known and fixed as proportionate sample size (Q c , Q T ) throughout the trial. The second is that the control arms rates priors are fixed at the expected incidence pooled rate. Therefore, in order to have a weakly informative impact on the prior distributions, and to carefully avoid overfitting or underfitting of the model, the weakly informative prior would be commonly recommended [21]. The specification of these hyperparameters depends on the application and is further discussed in the application section [22,23].
Using the Bayesian hierarchical blinded model, posterior samples can be generated via Markov chain Monte Carlo (MCMC) methods, and the posterior probability P jS1 of a safety signal at stage 1 is given by P j S1 ¼ Pð π Trt j > π M j jBlinded DataÞ . After a specified number of subjects have been enrolled into the trial, during the interim safety analysis, the following decision rule can be applied for each AE to flag potential safety signals: P j S1 ≥ P crit 1 ; assuming some pre-specified critical value P crit 1 . If the posterior probability exceeds the pre-specified critical value, an analysis of unblinded data can be performed to confirm the safety issue.

Stage 2: Bayesian Unblinded safety monitoring
If at any point during stage 1 the blinded monitoring flags a safety signal, the unblinded dose-response effect for each AE will be modeled in stage 2 using a Bayesian hierarchical logistic model. It should be noted that only the AE(s) flagged at stage 1 will be unblinded and be subject to stage 2 monitoring. Under the scenario of various dose levels, assume the occurrence Y ij of the j th AE at the i th dosage has a Binomial distribution with occurrence probability π ij . Assuming the number of subjects for the i th dosage arm is represented by n i , The logit function of π ij is modeled with a linear predictor consisting of a fixed covariate effect of dose strength (X i ): logit(π ij ) = β 0j + β 1j X i , for i = 0, 1, 2, …, I and j = 1, 2, …, J.
In this model, the regression parameters β 0j and β 1j represent the control group parameters (intercept) and the regression parameters for the incremental effect of dose, respectively. Note that the logistic model could also be applied to a two-arm study. The hierarchical priors for β 0j and β 1j are given by where the parameter μ β 0 ¼ logitðπ M j Þ allows for varying baseline incidence rates among the different types of AEs, and.
The hyperparameters μ 1 ; σ 2 1 and U 1 , U 2 , U 3 , U 4 are fixed constants and are discussed in the application section.
The Bayesian hierarchical logistic model provides the posterior probability that the slope coefficient for dose is greater than 0; that is β 1j > 0. Slopes larger than 0 indicate a significantly increased occurrence probability of the j th AE associated with the dose. The posterior probability P jS2 of a safety signal at stage 2 is given by Therefore, P jS2 is compared to a pre-specified stage 2 critical value P crit 2 , and a safety signal is confirmed when P j S2 ≥ P crit 2 .

Conduct of the trial
Given the models described in the previous section, we propose that a clinical trial be conducted via the sequential steps presented in Fig. 1.
Details are shown in the following steps: 1) Enrolled subjects are randomly assigned to each arm (either a simple two-arm trial or a multi-arm trial). 2) Interim safety analysis occurs after N subjects have been enrolled into I + 1 arms, with n 0 subjects enrolled into the control arm and n 1 , n 2 , …, n I subjects enrolled into treatment arms. 3) During stage 1, Y j subjects report experiencing AE j at an interim point; that is, the observed pooled incidence rate for AE j is equal toπ j ¼ Y j N , and π M j is the pre-specified expected pooled incidence rate of this AE. 4) Based on the Bayesian hierarchical blinded model, the posterior probability P jS1 of a safety signal at stage 1 is given by P j S1 ¼ Pðπ Trt j > π M j j Blinded DataÞ. P jS1 is compared to the prespecified critical value P crit 1 . Once the model identifies a potential safety signal, P j S1 ≥ P crit 1 , the safety data for the j th AE is unblinded and moved to stage 2. 5) During stage 2, only those AE(s) that have been flagged at stage 1 are examined. The Bayesian hierarchical logistic model provides the posterior probability of a safety signal P jS2 = P(β 1j > 0| Unblinded Data), which is compared to the prespecified stage 2 critical value P crit 2 . A safety issue is confirmed when P j S2 ≥P crit 2 . 6) Repeat at each interim point, updatingπ j for stage 1. At any point a safety signal is detected, follow the decision rules above to confirm the potential safety issue.

Case study
Consider a multi-arm case study of the Hyperbaric Oxygen Brain Injury Treatment (HOBIT) trial [24,25].
HOBIT is a phase II clinical trial adaptive design for selecting the optimal dose regimen of hyperbaric oxygen (HBO) treatment, defined as the regimen (hyperbaric oxygen with or without normobaric oxygen at different pressure levels) which produces the greatest improvement in the rate of good neurological outcome versus standard care for subjects with severe traumatic brain injury.
For the HOBIT trial, the randomization occurs via the study-specific password-protected website accessed by an authorized research coordinator or investigator at the clinical site. Subjects are considered to be enrolled at the time of randomization, regardless of whether or not they start or complete study treatment. The trial uses the intent-to-treat randomized sample, where subjects are classified by the Oxygen Toxicity Units dose in which they are randomized, regardless of the dose received. The data for interim analysis (for efficacy) are collected from the subjects who have been randomized for more than 4 weeks from the time of the data freeze. In addition, the interim analysis of safety monitoring occurs after N = 53 subjects have enrolled into the trial. In this paper, the hypothetical scenarios of interim safety analysis occur after 53 subjects have enrolled into the trial, with 11 subjects enrolled into the control arm and 6 subjects enrolled for each treatment arm. However, this number changed to 56 with sample size modified to 9 for the "2.5 ATA + NBH" treatment arm, in the HOBIT trial. The comparison of AEs is between the control arm with seven treatment arms, where the sample size and dosage for eight arms are given in Table 1.

Adverse event of special interest
The review of safety data focuses on the following AEs potentially associated with hyperbaric oxygen treatment or in the transfer of subjects to getting their treatments.
This subject population presents with significant morbidity with respect to all the below AEs; as such, it is important to evaluate the presence of events concerning temporal relationship to treatment (i.e., novel onset or worsening) as well as its relationship across doses. Therefore, the major individual AEs with clinical relevance and expected event rate are listed in Table 2. Additionally, the clinical information of each AE in Table 2 provides the simulation patterns from a modeling perspective.
All the AEs of special interest are summarized by preferred term and associated system-organ class according to the Medical Dictionary for Regulatory Activities (MedDRA) adverse reaction dictionary and by treatment group in terms of frequency of the event, number of subjects having the event, time relative to randomization, severity, and relatedness to the treatment. Cumulative incidences of the specific AE related to hyperbaric oxygen are compared across arms.

Simulation study
In the simulation study, an example is provided by following the HOBIT trial design to demonstrate the twostage safety monitoring process and decision criterion. Fig. 1 The flowchart of the Two-stage Bayesian safety monitoring for blinded and unblinded data. The figure displays the process of Two-stage Bayesian hierarchical monitoring, which starts with collecting the number of reported AE subjects in the pooled data, then goes through the Bayesian hierarchical blinded model to detect the potential safety signals at stage 1 for blinded safety data. Then, the Bayesian hierarchical logistic model is implemented to confirm the safety issue after the safety data being unblinded at stage 2 As considered and discussed by Berry et al. and Gajewski et al. about the strategy to select the specification of the hyperparameters, the selection is determined by outcome type and expectation of the dose-response for the particular application [21,24]. Therefore, in our application of the HOBIT trial, with the aim to minimize the informative impact on the prior distribution, and to avoid overfitting or overfitting for the model, [24] the hyperparameters described in Section 2 are assumed follow the fixed values: for the unblinded model. Additionally, the π j are defined as a combination of control incidence rate plus the all-treatment incidence rate, π j = Q c • π Ctr + Q T • π Trt , where Q c = 0.2, Q T = 0.8 were given by protocol information.
In order to understand the operating characteristics, several patterns of AEs are simulated. The simulation calculations for the two-stage Bayesian monitoring models were applied by MCMC methods, with the code presented in the Additional file 1. The results are based on 10,000 iterations of the study, each generated using  Abnormal collection of air in the pleural space between the lung and the chest wall, can result in steadily worsening oxygen supply. This is a pressure related phenomenon that can also be caused by major trauma or medical procedure. As an AE it is expected to increase as a function of dose atmospheres, but not duration of exposure or number of days treatment. This is expected to occur during the dive and would result in aborting the treatment.

2%
Signs of Pulmonary Dysfunction Signs of pulmonary dysfunction, including PaO2/FiO2 ≤ 200 or requiring PEEP > 10 cm of water to maintain a PaO2/FiO2 ratio of > 200. This is an adverse event which may be related to total oxygen toxicity exposure and as such should increase with dose and number of treatments. Symptoms are expected to progressively worsen over subsequent dives.

25%
Pneumonia This is an adverse event which is related to total oxygen toxicity exposure and as such should increase with dose and number of treatments. Symptoms are expected to progressively worsen over subsequent dives.

40%
Critical decreased CPP (< 60 mmHg) This AE is not specific to HBO therapy, but is associated with poor outcome (reperfusion). It is expected to be the same in all groups but could demonstrate differences if the process of transferring to the dive chamber causes increased AEs. This should be analyzed as active vs. control.

Critical hypotension (MAP< 70 mmHg)
This AE is not specific to HBO therapy, but can be related to transfer from critical care unit (e.g. disconnecting and reconnecting of lines). It is expected to be the same in all groups but could demonstrate differences if the process of transferring to the dive chamber causes increased AEs. This should be analyzed as active vs. control.

75%
Seizures during HBO treatment These are expected to occur immediately proximal to treatment as a function of dose oxygen toxicity (rather than cumulative exposure). It is possible to have multiple episodes of AE. Subjects with a baseline propensity to seize may elevate the numerator for this AE.

1%
Hypercarbia during transportation (PaCO2 > 45 mmHg) This AE is not specific to HBO therapy, but related to transfer from critical care unit (e.g. disconnecting and reconnecting of lines). It is expected to be the same in all groups but could demonstrate differences if the process of transferring to the dive chamber causes increased AEs. This should be analyzed as active vs. control.

Two-stage Bayesian hierarchical safety monitoring models
Two approaches-a Beta-Binomial independent model and the hierarchical model-are applied to compare the family-wise error rate for blinded stage 1 safety data [26]. Table 3 provides the model comparisons for hypothetical observed event rates with the probability of flagged trials for the AE of special interest. Here, the π 0 is the true incidence rate for the specific AE and does not assume to be the same for all non-control arms. For the case study at blinded stage, the simulated incidence rate were generated unequally under various scenarios.
The choice of critical value should be pre-specified and depend on the severity of the AE and should be decided upon by investigators based on their experience. For the first interim analysis, a sample size N = 53 and a stage 1 critical value of 0.9 are assumed. For each specific AE, we assume the observed incidence rate varies under different scenarios, from the expected rate (safe rate) to a higher rate (unsafe rate). Based on the true incidence rate and the expected event rate, the proportion of flagged trials are given in Table 3. Table 3 shows that as the observed incidence rate increases, the proportion of flagged trials increases. For example, based on historical data the expected event rate of "Signs of Pulmonary Dysfunction" is 0.25 and the critical value at stage 1 is 0.9. Therefore, the Beta-Binomial independent model decision rule is P π j > 0:25jBlinded Data À Á ≥ 0:9: The Bayesian blinded hierarchical model decision rule is given by In this case, a safety signal would be flagged if the posterior distribution provides evidence that the overall incidence rate likely exceeds 0.25. Additionally, under the scenario of no signal pattern, family-wise error rates are calculated across all seven AEs. The Bayesian hierarchical model is recommended for safety signal detection, since it accounts for multiplicities and it reduces the FWER because of the shrinkage at each AE type that is induced by the hyperparameters. The hierarchical model shows a smaller FWER compared to the Beta-Binomial independent model, as well as the smaller proportion of flagged trials.
Stage 2 includes all AEs that were flagged in stage 1. After unblinding the safety data, the dose-response effect of the AEs is modeled using Bayesian hierarchical logistic regression. The logit function of incidence rate for each arm was modeled using a linear predictor consisting of a fixed covariate effect of dose strength (X i ) for each patient, where X i is summarized as oxygen toxicity units/100 [24,25].
The HOBIT trial is an eight-arm trial, and nondecreasing incidence rates are assumed for each dose as dosage increases. Five different scenarios are considered and shown in Figs. 2, 3, 4, 5 and 6, where the average signal corresponds to the blinded scenarios. These figures show the simulation study patterns of various nondecreasing incidence rates as dosage increases for eight arms. In addition, the proposed two-stage models could be tested on the performance of detecting and confirming those safety issues under different AEs with varied expected incidence rates. For each scenario, the x-axis represents the dosage for each arm, and the y-axis indicates the observed incidence rate π j . A scenario of no effect across all AEs is considered (Fig. 2), and a scenario that assumes the same effect for all the AEs but with a safety issue (Fig. 3) is also considered. The same effect scenario was chosen to investigate the situation where the hierarchical model does very well. This assumption is relaxed in the next scenario. In another scenario, only the first three AEs (Pneumothorax Induced by HBO therapy, Signs of Pulmonary Dysfunction, and Pneumonia) have a safety issue (Fig. 4). Under this case, the proposed model is tested on a situation that only 3 AEs have a safety issue with no issue for the rest. In the HOBIT trial, as described in the Table 2, some AEs (Critical decreased CPP, Critical hypotension, and Fig. 2 The scenario I of no effect across all seven AEs under various flat incidence rate as dosage increases for eight arms. (The dashed line is the expected incidence rate for each AE) Fig. 3 The scenario II of same effect across all seven AEs with safety issues under various increasing incidence rate as dosage increases for eight arms. (The dashed line is the expected incidence rate for each AE) Fig. 4 The scenario III of same effect for the first three AEs with safety issues (No effect for the rest) under various non-decreasing incidence rate as dosage increases for eight arms. (The dashed line is the expected incidence rate for each AE) Fig. 5 The scenario IV of three AEs with flat effect relationships and same effect for rest of AEs with safety issues under various non-decreasing incidence rate as dosage increases for eight arms. (The dashed line is the expected incidence rate for each AE) Fig. 6 The scenario V of flat effect where both the control and treatment arms are the same but higher than the expected incident rate under various flat incidence rate as dosage increases for eight arms. (The dashed line is the expected incidence rate for each AE) Hypercarbia during transportation) should be analyzed as active vs. control because they could potentially have a flat effect (e.g., in the logistic regression), thus these are modeled separately at stage 2 in scenario IV (Fig. 5). Additionally, a flat effect is considered where both the control and treatment rates are the same but higher compared to the expected incidence rate (Fig. 6). Under this case, assume the control group has a higher incidence rate than the expected, which is a safety issue. Then the proposed model is applied to test the detection and confirmation performance for this scenario.

Results
The proposed safety monitoring process starts once 53 subjects have been enrolled into the trial for the first interim analysis. The Bayesian hierarchical blinded model is applied for detecting the potential safety signals at stage 1 and moves to stage 2 once the model detects a safety signal. In stage 2, the confirmation of safety is monitored using a Bayesian hierarchical logistic model. The critical value for stage 1 is set to 0.9 following the protocol and varied critical values for stage 2 from liberal to conservative. Three critical values situations are as follows: 1) Liberal: (0.9, 0.7), Medium: (0.9, 0.8), Conservative: (0.9, 0.9). Operating characteristics and FWER results are given in Table 4 for (A) no effect scenario I, Table 5 for the (B) same effect for all the AEs with safety issue scenario II, and Table 6 for the (C) same effect for three AEs with safety issue (No effect for the rest) scenario III, Table 7 for the (D) three AEs with flat effect relationships and same effect for the rest with safety issues scenario IV, and Table 8 for the (E) flat effect where both the control and treatment arms are the same but higher than the expected incidence rate scenario V.
The summarized information of simulation scenarios and results comparison is given in Table 9. The scenario Table 4 The probability of flagged trials for the AEs under the no effect for scenario I I can be treated as baseline proportions of no effect for all the AEs, then compared to scenario II, the proportions increase a lot as all the AEs have safety issues in scenario II. In scenario III, the first three AEs show higher proportions and the rest keep smaller proportions, since the first three AEs have safety issue in the scenario III. The difference between scenario II and scenario IV is that AE4, AE5, and AE7, these three AEs could be analyzing with active vs. control pattern, then we change their incidence rate as flat effect relationship. By comparing scenarios II and IV, the proportions of those flat effect relationship AEs decrease, and the rest AEs proportions are much similar. Based on the scenario II, scenario V is considered where all the AEs have a flat effect where both the control and novel therapies treatment are the same but higher compared to the expected incidence rate. The proportions indicate the safety issue, and one interesting finding is that the model flags potential signals at the blinded stage but not at the unblinded stage with fewer proportions comparing to other scenarios. At stage 1, by setting the pre-specific critical value to 0.9, the proportion of flagged trials is very similar within each AE; and at stage 2, as the critical value varies from 0.7 (liberal) to 0.9 (conservative), the proportion of flagged trials decreases. Therefore, the overall proportion is calculated by multiplying the proportions of both stage 1 and stage 2, and the overall proportion decreases as the critical value changes.
For the safety analysis, the critical values needs to balance the false flagged rate and false non-flagged rate. For example, scenarios I and II have proportions of flagged trials that are respectively equal to 0.05 and 0.75, under the pre-specific critical value of 0.9 for two-stage blinded and unblinded analyses. Similarly, scenarios III, IV and V have the proportions equal to 0.34, 0.66, 0.33 of flagged trials respectively with the same two-stage prespecific critical values. In some instances, these operating characteristics may not change. However, in other instances, the proposed approach may change with the monitoring of efficacy. For example, if the treatment truly has no impact on efficacy (e.g. under the null hypothesis) there would be little impact on the first interim analysis. However, suppose scenario II is true but the drug has a true alternative hypothesis that has a probability of 0.3 of reaching the final success criteria. This would be the case where the DSMB would be hardpressed to stop a promising treatment because of safety. In fact, the probability of both a safety signal and efficacy signal is 0.75 multiply by 0.3, which equals 0.225, clearly not a negligible amount. The good news is in the scenario I for safety the identification of a false flagged trial is 0.05 and under the null hypothesis for efficacy has a probability of 0.01 of reaching the final success criteria. The probability of both a safety signal and efficacy signal is 0.0005. The results show that the Two-stage Bayesian safety monitoring model can detect and flag a potential safety signal, and with the most important feature that further action at stage 2 could confirm the safety issue. In addition, the family-wise error rate is also applied to the scenario I for no effect across all arms, [26] as shown in Table 4. The FWER is around 0.18 at the blinded stage 1 and decreases from 0.57 to 0.25 as the critical value increases at the unblinded stage 2, which the FWERs are acceptable under the current sample size scenario. The overall FWER across all seven AEs is relatively small, with only 5% incorrectly flagged for both critical values set to 0.9. That is, the two-stage model has an excellent accuracy of safety signal detection and confirmation.

Discussion
Both sponsors and the DSMB often desire interim safety monitoring for clinical trials. In this paper, a two-stage Bayesian monitoring method is proposed to evaluate whether the posterior probability of a safety signal exceeds a pre-specified critical value. The proposed twostage monitoring method not only combines the safety monitoring for blinded and unblinded data, but it also offers a comprehensive approach for detecting a potential safety issue of blinded data during stage 1 and performing an analysis of unblinded data at stage 2 to confirm the safety issue.
The Beta-Binomial model was originally proposed by Ball, and further development of the Binomial model was introduced in his recent safety monitoring paper as well [27]. Although, other available statistical methods have been developed and established for blinded safety monitoring [28,29]. We adhere to the Binomial blinded safety monitoring model in this paper, since the followup period was fixed and the AEs were counted once during the study as indicated in the statistical analysis plan of the HOBIT trial. However, other developed methods, [28,29] for example, the Poisson model account for exposure time is also feasible and practical for the twostage Bayesian monitoring framework. Direction for future development include the Poisson model framework, because of exposure-time is as critical as a number of events for drug safety monitoring. In recent research studies, the Poisson likelihood model was often used in blinded safety analysis, while considering the exposure time of AEs [28,29]. Furthermore, it would easily allow combining multiple studies with different starting times during safety monitoring. Under the assumption that the AE for a given patient occurs independently and with a constant rate, a Poisson model could be applied to monitor safety signal. In addition, another development move from specifying a fixed expected pooled incidence rate π M j for adverse events to using an informative prior instead. This allows a fully Bayesian treatment for two-stage safety monitoring. Moreover, regarding the criteria for the safety signal confirmation at stage 2, the incremental effect of dose for the current model, which is the slope, larger than 0 is the only indicator for detecting a significantly increased occurrence probability of the AE associated with the dose. One limitation that the toxicity probability at the highest dose, which is a sufficient indicator of safety signal confirmation criteria, but not considered in current model. Therefore, the toxicity probability of the highest dose could be included for the future development.
With respect to the generalizability of the proposed two-stage monitoring model, it could also provide support to cancer studies which have relatively small incidence rates for some AEs. Future work could add the evaluation of unblinded safety data conducted adjusting for relative baseline covariates, such as age at baseline or sex. The severity of an AE could also be built into the   The model flags potential signals at the blinded stage but not at the unblinded stage with fewer proportions comparing to other scenarios.
model. Finally, because the performance of such models depends on prior knowledge and researchers' experience about AE incidence rates, the model could consider the selection of critical values and expected incidence rates for decision criterion as well. In the current study, the critical values for both stage 1 and stage 2 were set to 0.9 following the example study protocol, but future studies could relax this value. Another interesting extension in stage 2 is to modify the structure of the model, for example, either as random intercept/slope, or some other models, such as non-linear dose level model, Bayesian normal dynamic linear model (NDLM) and EMAX models [24].

Conclusion
The Beta-Binomial model and Bayesian hierarchical blinded model are considered and compared in stage 1, and the Bayesian hierarchical model shows a lower familywise error rate than the Beta-Binomial model, thus illustrating how failing to properly account for multiplicities can result in unreliable inference, while approximately preserving the probability of correctly detecting AE types with a safety signal. In the simulation study assuming no safety signals, the FWER-the probability of at least one safety signal among all AEs-was tightly controlled. Furthermore, in the presence of a safety signal for some or all AEs, the two-stage monitoring model successfully detected and confirmed those safety signals.
In the event of a significant safety signal, the blinded executive team can request to be unblinded to safety data only. If there is a significant trend but some arms appear to be safe, the DSMB and study team can discuss which arms to terminate. The interim monitoring and analysis of safety data could help prevent safety problems from turning to significant concerns in an ongoing clinical trial.
In summary, the decision to terminate a trial due to safety concerns is not a purely statistical one. This is one reason the DSMB is not comprised entirely of statisticians. The two-stage safety procedure in this paper provides a statistical view to monitor safety during the clinical trials, but never represents the medical and clinical decisions. More evaluation research and collaboration with clinicians and safety team are needed, in order to advance the safety detection and monitoring.