 Research
 Open access
 Published:
The optimal prepost allocation for randomized clinical trials
BMC Medical Research Methodology volume 23, Article number: 72 (2023)
Abstract
Background
In prepost designs, analysis of covariance (ANCOVA) is a standard technique to detect the treatment effect with a continuous variable measured at baseline and followup. For measurements subject to a high degree of variability, it may be advisable to repeat the pretreatment and/or followup assessments. In general, repeating the followup measurements is more advantageous than repeating the pretreatment measurements, while the latter can still be valuable and improve efficiency in clinical trials.
Methods
In this article, we report investigations of using multiple pretreatment and posttreatment measurements in randomized clinical trials. We consider the sample size formula for ANCOVA under general correlation structures with the pretreatment mean included as the covariate and the mean followup value included as the response. We propose an optimal experimental design of multiple prepost allocations under a specified constraint, that is, given the total number of prepost treatment visits. The optimal number of the pretreatment measurements is derived. For nonlinear models, closedform formulas for sample size/power calculations are generally unavailable, but we conduct Monte Carlo simulation studies instead.
Results
Theoretical formulas and simulation studies show the benefits of repeating the pretreatment measurements in prepost randomized studies. The optimal prepost allocation derived from the ANCOVA extends well to binary measurements in simulation studies, using logistic regression and generalized estimating equations (GEE).
Conclusions
Repeating baselines and followup assessments is a valuable and efficient technique in prepost design. The proposed optimal prepost allocation designs can minimize the sample size, i.e., achieve maximum power.
Background
It is common in randomized clinical trials to collect information from patients before they enter the study. Typically eligibility for the trial is assessed at a screening visit, and a subsequent baseline visit is conducted prior to randomization to document clinical status at that time. Huntington disease studies for tetrabenazine and deutetrabenazine are randomized, placebocontrolled clinical trials (Huntington Study Group [1, 2]). As a motivation of this paper, the primary measure for both Huntington disease studies was the total chorea score of the Unified Huntington’s Disease Rating Scale, analyzed as a continuous variable. The total chorea score was measured at screening, baseline, and several followup visits. The treatment effect was evaluated using analysis of covariance (ANCOVA) model. In ANCOVA, both studies used the average baseline scores (i.e., the average values of two pretreatment measurements made at screening and at true baseline) as the covariate and the change from baseline as the dependent variable. The question then arises, “What are the benefits of using multiple pretreatment measurements?”
The use of multiple pretreatment measurements in randomized clinical trials has been proposed in recent years. In a randomized controlled trial for the effect of soy phytoestrogens on hot flashes in women with breast cancer, the hot flash scores were measured every 24 hours for 4 weeks baselines and 12 weeks followups [3]. A variety of endpoints, such as daily scores of migraine headache and brief fatigue inventory, were also assessed at multiple pretreatment and posttreatment measurements [4]. Besides, several statistical papers discuss repeating the pretreatment measurements for prepost design. Frison and Pocock [5] demonstrated the merits of using more than one pretreatment measurement in ANCOVA, with the pretreatment mean as the covariate and the posttreatment mean as the outcome. Bristol [6] presented simulation studies using two pretreatment measurements as covariates in linear regression models. Zhang et al. [7] considered the power analysis of choosing two baselines in ANCOVA for continuous variables and in logistic regression for categorical variables by simulation studies.
ANCOVA is a common technique to incorporate the baseline value as the covariate and estimate the treatment effect in randomized clinical trials. Standard theory, based on linear regression models, shows that the adjustment for a covariate reduces the residual variance by a factor of \(1 \rho ^2\), where \(\rho\) is the correlation between the covariate and the outcome [8]. That would increase the precision of detecting the treatment effect. Alternative approaches treat the pretreatment measurements as additional outcome variables in mixed effects analysis. This was exemplified by Liang and Zeger [9] and Tango [10]. These authors showed that the generalized linear mixedeffects model is another efficient tool for prepost design, which could extend to discrete responses with nonlinear models.
In randomized clinical trials with repeated measures, investigators usually focus on repeating the followup assessments, which is generally more advantageous than repeating the pretreatment measurements. However, the latter can still be valuable and was ignored by most of the clinical trials. In this paper, we address the benefits of repeating the baselines using the ANCOVA model, which would be an interesting and novel point of randomized controlled clinical trials. Besides, when there are multiple pretreatment and posttreatment measurements, we investigate the optimal prepost allocation to minimize the required sample size. In the section Methods, we consider the ANCOVA sample size formula using multiple prepost measurements under a general unequal correlation structure. We further derive the optimal number of pretreatment and posttreatment measurements given the total number of prepost visits. In section Results, we illustrate the above procedures using the “Beat the Blues” data from a clinical trial of an interactive multimedia program [11]. In simulation studies, we consider both continuous and binary outcomes. When the outcome is binary, exact formulas are generally not available but simulation studies show that repeating baselines is advantageous under logistic regression., We use simulation studies to assess how well the formulas and insights from the ANCOVA case extend to binary outcomes. Merits and future works of the proposed optimal design are in the last two sections.
Methods
Repeating pretreatment measurements in ANCOVA
We consider the ANCOVA model with the mean of multiple pretreatment measurements as the covariate and the posttreatment mean as the outcome. Consider normally distributed endpoints in a randomized clinical trial and suppose that there are two treatment groups \(i=0, 1\) (for placebo and treatment) with \(n_i\) individuals per group. For all individuals, assume there are S pretreatment visits and T posttreatment visits. Denote the pretreatment measurements as \(X_{ijs}\) and the posttreatment measurements as \(Y_{ijt}\), where \(i=0, 1,\ j=1,\ldots , n_{i}, s=1,\ldots , S\) and \(t=1,\ldots , T\). We assume the \(S+T\) prepost measurements \((X_{ij1}, \ldots , X_{ijS}, Y_{ij1}, \ldots , Y_{ijT})^\prime\) follows multivariate normal distribution with mean \(\varvec{\mu }=(\mu _{ij1}^{\text {pre}}, \ldots , \mu _{ijS}^{\text {pre}}, \mu _{ij1}^{\text {post}}, \ldots , \mu _{ijT}^{\text {post}})^\prime\) for \(i=0 \ \text {or} \ 1\) and the \((S+T) \times (S+T)\) variancecovariance matrix
Denote the pretreatment visits mean as \(\bar{X}_{ij \cdot }= \sum _{s=1}^S X_{ijs} /S\) and the posttreatment visits mean as \(\bar{Y}_{ij \cdot }= \sum _{t=1}^T Y_{ijt} /T, i=0, 1,\ j=1, \ldots , n_{i}\). The overall pretreatment mean \(\bar{X}=\sum _{i=0}^1 \sum _{j=1}^{n_i} \bar{X}_{ij \cdot } / (n_0+n_1)\). The ANCOVA model is
The estimated treatment effect \(\hat{\delta }=\hat{\mu }^{\text {post}}_{1 \cdot }\hat{\mu }^{\text {post}}_{0 \cdot }\), which is an unbiased estimator with variance formula [5, 12]:
where \(\bar{\Sigma }_{\text {pre}}, \bar{\Sigma }_{\text {post}}\) and \(\bar{\Sigma }_{\text {prepost}}\) are the mean of all elements in matrices \(\Sigma _{\text {pre}}, \Sigma _{\text {post}}\) and \(\Sigma _{\text {prepost}}\), respectively. Term \((\bar{X}_{1 \cdot \cdot }\bar{X}_{0 \cdot \cdot })^2\) can be negligible due to randomization and \((n_0+n_12) / (n_0+n_13)\) tends to 1 as sample size increases, which leads to the simple approximation [5].
Assume the covariance matrix
then we have \(\bar{\Sigma }_{\text {pre}}=\sigma _X^2 [1+(S1)\rho _X] /S, \bar{\Sigma }_{\text {post}}=\sigma _Y^2 [1+(T1)\rho _Y] /T\) and \(\bar{\Sigma }_{\text {prepost}}=\rho _{XY} \sigma _X \sigma _Y\). The variance formula of ANCOVA becomes
The merits of repeating the pretreatment visits (\(S \ge 2\)) can be obtained directly from the variance formula (2). Keep the number of posttreatment visits T and other parameters fixed, the variance decreases as the number of pretreatment visits S increases. Besides, when \(\rho _{XY}\) and other parameters are fixed, the higher the correlation between the pretreatment visits \(\rho _X\), the less benefit is obtained by repeating the pretreatment measurements. When \(\rho _X\) is fixed, the higher the correlation between the pre and postrandomization measurements \(\rho _{XY}\), the variance becomes smaller, and the efficiency is gained from repeating pretreatment visits.
The sample size formula per group under \(n_0=n_1\) of S pre and T posttreatment measurements is:
where \(\delta\) is the treatment effect, \(\alpha\) and \(\beta\) are the Type I and Type II error probabilities levels. The merits of repeating the pretreatment measurements can be obtained directly from \(n(S=1,T=1)n(S=2,T=1)\propto \frac{\rho _{XY}^2 (1\rho _X)}{1+\rho _X}>0\).
As a simple numerical illustration, suppose that \(\rho _X=\rho _Y=0.8, \rho _{XY}=0.6\), and the number of posttreatment visits \(T=1\). The ratio of sample size formula (3) for having a single baseline visit (\(S=1\)) and having both screening and baseline visits (\(S=2\)) is \(\frac{1+(T1)\rho _YT\rho _{XY}^2}{1+(T1)\rho _Y 2T\rho _{XY}^2/(1+\rho _X)}=1.067\). The omission of the second pretreatment visit would lead to an increase in the sample size of 6.7%.
The same question may be asked about the benefit of repeating the posttreatment measurements. The ratio of sample sizes for using a single posttreatment measurement (\(T=1\)) and two posttreatment measurements (\(T=2\)) is \(\frac{2 [1+\rho _X (S1)] 2 \rho _{XY}^2 S }{(1+\rho _Y) [1+\rho _X (S1)] 2 \rho _{XY}^2 S}\). Similarly, suppose \(S=1\) and other parameters remain the same; this gives the ratio of sample sizes as 1.185. The omission of the second postrandomization evaluation would lead to an increase in the sample size of 18.5%. Hence, repeating the posttreatment measurements is more valuable than repeating the pretreatment measurements in the ANCOVA model. The benefits combine if we repeat both prepost measurements.
Optimization of pretreatment visits given the total number of visits
In this subsection, we address the related optimization problem when designing randomized clinical trials with multiple prepost measurements. For a given total number of visits \(M=S+T\), we are interested in the optimal number of pretreatment visits \(S_{\text {opt}}\), which minimizes the sample size.
First, we consider the equal correlation structure as \(\rho _X=\rho _Y=\rho _{XY}=\rho\). Since \(S+T=M\) is a fixed number and \(\alpha , \beta , \delta , \sigma _Y^2, \rho\) are constant, minimizing the sample size \(n \propto \frac{\rho (1\rho ) M+ (1\rho )^2}{(MS) [1+ \rho (S1)]}\)is equivalent to maximizing the function \(f(S)= (MS) [1+ \rho (S1)].\) This is a quadratic function with a negative leading coefficient under the assumption that \(S \ge 1\). The optimal number of pretreatment visits is
Now we consider the sample size formula (3) under the unequal correlation structure. Minimizing the sample size formula is equivalent to minimizing the following objective function
for \(1 \le S < M\). Notice this is a quotient of two quadratic polynomials of S.
Theorem 1
Assume \(\rho _X \rho _Y \rho _{XY}^2 \ge 0, 0< \rho _X, \rho _Y < 1\) and \(\rho _{XY} \ne 0\). The objective function f(S) has a unique minimum point on \(S \in [1, M)\) if \(M \ge \sqrt{\frac{1\rho _Y}{(1\rho _X) \rho _{XY}^2}}+1\). The minimum point is
Otherwise, if \(M <\sqrt{\frac{1\rho _Y}{(1\rho _X) \rho _{XY}^2}}+1\), then \(S_{\text {opt}}=1\).
Proof
The proof contains two parts: we first verify that the objective function f(S) has a unique minimum point on [1, M) and then derive the minimum point \(S_{\text {opt}}\).
Part 1: Uniqueness. The two roots of the denominator Q(S) are \(S=11/ \rho _X\) and \(S=M\). Since \(\rho _X>0\) and Q(S) has negative leading coefficient, \(Q(S)>0\) for \(S \in (11/ \rho _X, M)\). The numerator P(S) also has negative leading coefficient. Since \(P (11/ \rho _X) = \rho _{XY}^2 \left( 1\frac{1}{\rho _X}\right) \left( M1+\frac{1}{\rho _X} \right) > 0\) and \(P(M)=[1+\rho _X (M1)] (1\rho _Y)> 0, P(S)>0\) for \(S \in (11/ \rho _X, M)\). Therefore, \(S=11/ \rho _X\) and \(S=M\) are two vertical asymptotes of f(S), i.e., \(\lim _{S \rightarrow (11/ \rho _X)^{+}} f(S)= +\infty\) and \(\lim _{S \rightarrow M^{}} f(S)= +\infty\).
Since \(\rho _{XY} \ne 0, P(11/ \rho _X)>0\) and \(P(M)>0, P(S)\) and Q(S) have no common zero. Equation \(f(S)=P(S)/ Q(S)=k\) can be transformed into a quadratic equation, which has at most two roots. Hence, f(S) has a unique (relative) minimal point \(s_0\) in \((1 1/ \rho _X, M)\), which is absolute minimal point by our discussion. The function f(S) is decreasing in \(( 1 1/ \rho _X, s_0)\) and increasing in \((s_0, M)\). Therefore, if \(s_0 \in [1,M), s_0\) is the minimal point; Otherwise, \(S=1\) is the minimal point.
Part 2: Derive \(S_{\text {opt}}\). The minimal point \(s_0\) in \((11/ \rho _X, M)\) satisfies \(f'(s_0)=0\). Obviously, the objective function can be written as
where \(A=1\rho _Y, B=\rho _{XY}^2 (1\rho _X) /\rho _X\) and \(C=\rho _Y\rho _{XY}^2/\rho _X\). Then
Since \(A>0\) and \(B \rho _X>0\), the only solution of \(f^\prime (S)=0\) in \((11/ \rho _X, M)\) satisfies
So
which is
We can check that when \(M \ge \sqrt{ \frac{1\rho _Y}{(1\rho _X) \rho _{XY}^2}}+1, s_0\ge 1\). So we have the conclusion. \(\square\)
Remark 1
When \(\rho _{XY}=0\), the pretreatment measures are unrelated to the posttreatment measures. Hence \(S_{\text {opt}}=1\) under this special case. Also, since the \(S_{\text {opt}}\) in (5) is usually not an integer, one should calculate the values of the objective function f(S) on both \(\lfloor {S_{\text {opt}}}\rfloor\) and \(\lceil {S_{\text {opt}}}\rceil\) and select the smaller one.
As an illustration, we assume that \(\rho _{XY}=0.6, \rho _X=\rho _Y=0.8\), and the total number of visits \(M=10\). Following Theorem 1, we obtain that \(M=10> \sqrt{\frac{1\rho _Y}{(1\rho _X) \rho _{XY}^2}}+1=2.67\) and \(S_{\text {opt}}=4.14\). Since \(f(\lfloor {S_{\text {opt}}}\rfloor )=f(4)=0.4098 < f(\lceil {S_{\text {opt}}}\rceil )=f(5)=0.4114\), the optimal number of pretreatment visits is \(S=4\).
Now we consider a special case of \(\rho _X=\rho _Y=\rho\) with the assumption \(\rho \ge \rho _{XY}\). When \(M \ge 1 / \rho _{XY}+1\),
which gives Eq. (4) under the further condition that \(\rho _{XY}=\rho\). When fixing \(\rho\), the higher the correlation between the prepost measurements, the larger \(S_{\text {opt}}\) is obtained. When fixing \(\rho _{XY}\), the higher the correlation between two pretreatment measurements or two posttreatment measurements, the smaller \(S_{\text {opt}}\) is obtained.
In conclusion, when the total number of prepost visits is fixed, one can obtain the optimal choice of S pretreatment measurements and T posttreatment measurements to minimize the sample size. Measurements taken after the randomization can be more informative under the special case of \(\rho _X=\rho _Y\) (since \(S_{\text {opt}}<M/2\)), while repeating the pretreatment measurements is also valuable.
Results
Numerical example
We consider the “Beat the Blues” data from a clinical trial of an interactive multimedia program [11]. The data are available as the data frame “BtheB” in the R package HSAUR2. One hundred patients were allocated to the placebo group (\(n_0=48\)) and the treatment group (\(n_1=52\)). Each patient had \(S=1\) baseline visit and \(T=4\) posttreatment visits at 2, 3, 5, and 8 months after randomization.
Assume that these \(S=1\) and \(T=4\) measurements follow the unequal correlation structure with the variancecovariance matrix \(\varvec{\Sigma }\). Based on the data set, we found that \(\hat{\sigma }_X^2=117.5, \hat{\sigma }_Y^2=116.8, \hat{\rho }_{XY}=0.52\) and \(\hat{\rho }_Y=0.77\). Since there is only \(S=1\) pretreatment visit, \(\hat{\rho }_X\) could not be estimated. Instead, we simply assumed that \(\hat{\rho }_X=\hat{\rho }_Y=0.77\). The treatment effect obtained from the dataset is \(\hat{\delta }=5.4\). Using these estimates, we calculate the sample size per group (assume \(n_0=n_1=n\)) under \(\alpha =0.05\) and \(1\beta =0.8\) using formula (3).
From Table 1, we verify that repeating the posttreatment measurements can be more valuable (with a smaller sample size) than repeating the pretreatment measurements. The benefits combined if we repeat both prepost measurements, e.g., \(S=2, T=4\) can reduce up to 28.3% sample size compared with the single prepost design (\(S=1, T=1\)). Note that in our numerical example, we consider a fixed power at 0.8 for different allocation strategies (See Table 1). The purpose of this example is to show that when power is fixed, more pretreatment and posttreatment visits will lead to a smaller sample size per group, i.e., a more efficient trial. Equivalently, if the sample size is fixed, more S and T would lead to a more powerful analysis.
We also derive the optimal number of pretreatment visits S given the total number of visits \(M=5\). Using formula (5) in Theorem 1, we obtain that \(M \ge \sqrt{\frac{1\rho _Y}{(1\rho _X) \rho _{XY}^2}}+1=2.9\) and \(S_{\text {opt}}=1.8\). Since for \(\lfloor {S_{\text {opt}}}\rfloor =1, n(1,4)=36\) and for \(\lceil {S_{\text {opt}}}\rceil =2, n(2,3)=35, S=2\) is the optimal number of pretreatment visits. Hence, repeating the pretreatment measurements (\(S=2, T=3\)) is superior to using a single baseline (\(S=1, T=4\)) under the constraint of the total number of visits \(M=5\).
Simulation studies
The previous algebra applies only to continuous measurements analyzed by the ANCOVA model. Other models are needed when the outcome variable is discrete. The exact formulas for power calculations are generally not available for nonlinear models with binary outcomes. Hence, we set up Monte Carlo simulation studies to assess how well the formulas and insights from the ANCOVA model extend to the nonlinear models. In this section, we conduct simulation studies on continuous and binary measurements. For continuous measurements, we use the ANCOVA model with the pretreatment mean as covariate and the posttreatment mean as outcome. The binary outcomes are analyzed by logistic regression for a single outcome and by generalized estimating equations (GEE) for multiple outcomes. All simulation results were obtained using 20,000 replications.
Single / Multiple Continuous Outcomes
For a single continuous outcome, we assume there are \(S=2\) and \(T=1\) continuous measurements as \(X_1\) (screening), \(X_2\) (baseline), Y (outcome) and \((X_1, X_2, Y)\) follows MVN(\(\varvec{\mu }, \varvec{\Sigma }\)). For the control group, \(\varvec{\mu }=(0,0,0)\) and for treatment group, \(\varvec{\mu }=(0,0,\delta )\). Assume \(\sigma _X^2=\sigma _Y^2=1\). Different \(\rho _{XY}\) and \(\rho _X\) are considered: \(\rho _{XY}=0.5, \rho _X=\{0.6, 0.7, 0.8, 0.9\}\); \(\rho _{XY}=0.6, \rho _X=\{0.7, 0.8, 0.9\}\) and \(\rho _{XY}=0.7, \rho _X=\{0.8, 0.9\}\). The sample sizes of the control and treatment groups are \(n_0=n_1=\{50, 75, 100, 125, 150\}\).
The ANCOVA model (1) is considered of using only baseline (\(S =1\)) as the covariate or taking the mean of screening and baseline (\(S=2\)) as the covariate for a single continuous outcome Y. We set the effect size \(\delta =0\) to evaluate Type I error probabilities and \(\delta =0.3\) for power. The Type I error probabilities of ANCOVA models control well by using only baseline (\(S=1\)) or screening and baseline (\(S=2\)) (Table 2). The power of repeating pretreatment measurements consistently exceeds the power of using a single baseline (Table 3). For \(S=2\), when \(\rho _{XY}\) is fixed, higher \(\rho _{X}\) leads to lower power. When \(\rho _X\) is fixed, higher \(\rho _{XY}\) would obtain larger power.
For multiple continuous outcomes, we conduct simulation studies to obtain the optimal number of pretreatment visits \(S_{\text {opt}}\) given the total number of visits \(M=10\). Similarly, we generate \(M=10\) continuous measurements \((X_1, \ldots , X_S, Y_1, \ldots Y_T)\) using multivariate normal distribution with mean \(\varvec{\mu }=(\mu _X, \ldots , \mu _X, \mu _Y, \ldots , \mu _Y)\) and covariance matrix \(\varvec{\Sigma }\), where \(S=\{1, \ldots , 9\}\) and \(T=MS\). For control group, \(\mu _X=\mu _Y=0\) and for treatment group, \(\mu _X=0, \mu _Y=\delta\). Again, assume \(\sigma _X^2=\sigma _Y^2=1\). Different \(\rho _{XY}\) and \(\rho _X=\rho _Y\) are considered as above; \(n_0=n_1=\{50, 100, 150\}\).
We set the effect size \(\delta =0\) to evaluate Type I error probabilities and \(\delta =0.25\) for power. The Type I error probabilities all control well (Table S1). The power results (Fig. 1) show that having more than 2 pretreatment visits can be more valuable than using a single baseline. The optimal number of pretreatment visits is highlighted in red, showing that \(S_{\text {opt}}\) are less than or equal to \(M/2=5\). In summary, the simulation results give a similar conclusion as the ANCOVA analyses in the section Methods.
A Single Binary Outcome
Denote \(S=2\) and \(T=1\) binary measurements as \(X_1, X_2\) and Y. We generate the correlated binary data using Gaussian copulas, which take the marginal of multivariate normal distributions to multivariate uniform distributions. Assume that the uniform margins \((U_{X_1}, U_{X_2}, U_{Y})\) has the correlation matrix
We then generate the Gaussian copulas under the correlation matrix \(\varvec{R}\) using R package copula [13]. The correlated binary measurements are obtained below. For the control group, \((X_1, X_2, Y)= \left( 1_{(U_{X_1} \le p)}, 1_{(U_{X_2} \le p)}, 1_{(U_Y \le p)} \right)\). The dichotomized probability p yields triplets of dependent Bernoulli variables. For the treatment group, \((X_1, X_2, Y)= \left( 1_{(U_{X_1} \le p)}, 1_{(U_{X_2} \le p)}, 1_{(U_{Y} \le p^\prime )} \right)\), where \(p^\prime =\frac{p e^{\beta _1}}{1p+p e^{\beta _1}}, \beta _1\) represents the treatment effect coefficient, so that \(\log \left( \frac{p^\prime }{1p^\prime } \right) = \beta _1+ \log \left( \frac{p}{1p} \right)\).
Three different logistic regression models are considered:
where Treat is the treatment indicator, \(X=X_1+X_2, X_C\) is categorial variable of X and \(X_{\text {log}}=\text {log} \left[ (X+1/2)/(2X+1/2) \right]\). The term 1/2 is introduced to avoid infinite estimates [14].
The logistic regression model
is equivalent to Model 2 for \(S=2\). That is because when \(S=2, X=X_1+X_2=\{0,1,2\}\). Then \(X_{\text {log}}=\text {log} [ (X+1/2)/ (2X+1/2) ]= \{\text {log}(5), 0, \text {log}(5)\}\), which is proportional to \(X1=\{1,0,1\}\). Hence, using X or \(X_{\text {log}}\) in the logistic regression model would provide exactly the same Type I error probabilities and power.
To detect the treatment effect, we consider the null hypothesis \(H_0: \beta _1=0\) vs. the alternative hypothesis \(H_1: \beta _1\ne 0\). Assume that the dichotomized probability \(p=0.4\). The sample sizes of the control and treatment groups are \(n_0=n_1=\{50, 75, 100, 125, 150\}\). Different \(\rho _{XY}\) and \(\rho _X\) (assume \(\rho _{XY}<\rho _X\)) are considered to generate the data, \(\rho _{XY}=0.5, \rho _X=\{0.6, 0.7, 0.8, 0.9\}\); \(\rho _{XY}=0.6, \rho _X=\{0.7, 0.8, 0.9\}\) and \(\rho _{XY}=0.7, \rho _X=\{0.8, 0.9\}\). We conduct simulation studies with the treatment effect coefficient \(\beta =0\) to obtain the Type I error probability and with \(\beta =0.8\) to obtain power. For logistic regressions with small samples, perfect separation may occur, leading to infinite estimates of the logistic regression coefficient and fitted probabilities close to zero and one. Hence, when \(n_0=n_1=50\), we only consider Models 1 and 2 in the simulation studies.
The simulation error for estimating the Type I error probability of \(\alpha = 0.05\) is \(1.96 \times \text {SE}=1.96 \times \sqrt{(0.05)(0.95)/20000}=0.003\). The Type I error probabilities of three different logistic regression models control well (See Table 4). Some of the Type I error probabilities are slightly conservative, which is reasonable for binary outcomes. The power results of three logistic regression models under different sample sizes, \(\rho _{XY}\) and \(\rho _X\) are shown in Table 5. The power of repeating pretreatment measurements using \(X_{\text {log}}\) or \(X_C\) (Models 2, 3) consistently exceeds the power of using a single baseline \(X_2\) (Model 1). When \(\rho _{XY}\) is fixed, the higher the correlation between two pretreatment measurements, the less benefit is obtained by repeating the pretreatment measurements. When \(\rho _X\) is fixed, the higher the correlation between the prepost measurements, the larger power is obtained.
Hence, repeating the pretreatment measurements is valuable under logistic regressions for a single binary outcome. This conclusion is the same as the ANCOVA model for continuous outcome variables, which shows that repeating the pretreatment measurements have a nice performance extending to the binary variables using logistic regression.
Multiple Binary Outcomes
We conduct simulation studies to obtain the optimal number of pretreatment visits \(S_{\text {opt}}\) given the total number of visits \(M=10\) under binary data. We use GEE logistic regression models [15] for correlated binary data when the number of posttreatment visits T exceeds one (multiple binary outcomes).
Similarly, we generate \(M=10\) correlated binary measurements \((X_1, \ldots , X_S, Y_1, \ldots Y_T)\) using Gaussian copulas, where \(S=\{1, \ldots , 9\}\) and \(T=MS\). The uniform margins \((U_{X_1}, \ldots , U_{X_S}, U_{Y_1}, \ldots , U_{Y_T})\) has a correlation matrix:
For the control group,\((X_1, \ldots , X_S, Y_1, \ldots , Y_T)= \left( 1_{(U_{X_1} \le p)}, \ldots , 1_{(U_{X_S} \le p)}, 1_{(U_{Y_1} \le p)}, \ldots , 1_{(U_{Y_T} \le p)} \right)\), and for the treatment group, \((X_1, \ldots , X_S, Y_1, \ldots , Y_T)= \left( 1_{(U_{X_1} \le p)}, \ldots , 1_{(U_{X_S} \le p)}, 1_{(U_{Y_1} \le p^\prime )}, \ldots , 1_{(U_{Y_T} \le p^\prime )} \right)\). Two GEE logistic regression models are considered as follows.
where \(Y_{ijt}\) is the multiple binary outcome, \(t=1,\ldots , T\). The treatment indicator \(\text {Treat}_{ij}=0\) for placebo and 1 for treatment, \(X_{ij+}=X_{ij 1}+ \cdots + X_{ij S}\) and \(X_{\text {log}, ij+}=\text {log} \left[ (X_{ij+}+1/2)/(2X_{ij+}+1/2) \right] , i=0, 1,\ j=1,\ldots , n_{i}\).
Consider \(H_0: \beta _1=0\) vs. \(H_1: \beta _1\ne 0\). Similarly, assume \(p=0.4, p^\prime =\frac{p e^{\beta _1}}{1p+p e^{\beta _1}}\) and \(n_0=n_1=\{50, 100, 150\}\). Different \(\rho _{XY}\) and \(\rho _X=\rho _Y\) are considered as \(\rho _{XY}=0.5, \rho _X=\rho _Y=\{0.6, 0.7, 0.8, 0.9\}\); \(\rho _{XY}=0.6, \rho _X=\rho _Y=\{0.7, 0.8, 0.9\}\) and \(\rho _{XY}=0.7, \rho _X=\rho _Y=\{0.8, 0.9\}\). We conduct simulation studies with treatment effect coefficient \(\beta _1=0\) to obtain Type I error probability and \(\beta _1=0.5\) to obtain power. We compare the power under 9 different scenarios of \(S=\{1, \ldots , 9\}\) and \(T=10S\), then find \(S_{\text {opt}}\) that has the highest power. For \(T=1\), we use logistic regression. For other scenarios, we use GEE logistic regression. Again, to avoid perfect separation for small samples, we only conduct the simulation studies using GEE Model 2 when \(n_0=n_1=50\) .
During the simulation studies, we found that the Type I error probabilities for GEE logistic regression (\(T \ge 2\)) are hard to control. This is because when the sample size is small, the robust sandwich estimator is biased downward for estimating \(\text {var}(\hat{\beta }_1)\) [16, 17] and the Zstatistics \(\hat{\beta }_1 / \sqrt{\text {var}(\hat{\beta }_1)}\) would be overestimated and then increase the Type I error probabilities. That will make the power comparison between \(T=1\) (logistic regression) and \(T \ge 2\) (GEE) to be inaccurate. Hence, the empirical calibration of the Ztest is applied to control the Type I error probabilities of GEE, and we obtain the empirical power for comparison.
We first obtain the Zstatistics \(\hat{\beta }_1 / \sqrt{\text {var}(\hat{\beta }_1)}\) under \(H_0\), which follows N(0, 1) when \(n \rightarrow \infty\). But since our sample size is not infinity, the \((\alpha /2)\times 100\%\) and \((1\alpha /2)\times 100\%\) quantiles of the Zstatistics are not the quantiles of N(0, 1). To calibrate the Type I error probabilities at level \(\alpha\), we obtain the empirical \((\alpha /2)\times 100\%\) and \((1\alpha /2)\times 100\%\) quantiles of the Zstatistics from simulation studies. By definition, those empirical quantiles have Type I error probabilities exactly equal to \(\alpha\). We then use these empirical quantiles to calibrate the power. Similar ideas of using pvalue empirical calibration to control the Type I error probabilities are discussed by several authors [18, 19]. To make it consistent, we calibrate the Type I error probabilities at level \(\alpha\) for not only the GEE regression (\(T \ge 2\)) but also the logistic regression (\(T=1\)), then compare the calibrated power for different \(S=\{1, \ldots , 9\}\).
The original Type I error probabilities (without calibration) of multiple binary outcomes using GEE models are shown in Tables S2S4. The upper bound of 95% confidence interval for estimating the Type I error probability at \(\alpha = 0.05\) is \(0.05+1.96 \times \sqrt{(0.05)(0.95)/20000}=0.053\). The inflated original Type I error probabilities (\(>0.053\)) are shown in italic font in these tables. When \(n_0=n_1=50\), the original observed Type I error probabilities are hard to control under the GEE logistic regression (Table S2). With a larger sample size (\(n_0=n_1=100, 150\)), more observed Type I error probabilities can be controlled (Tables S3, S4). The calibrated Type I error probabilities are all equal to \(\alpha =0.05\) (not shown in the tables).
The calibrated power comparison for \(S=\{1, \ldots , 9\}\) using two GEE logistic regression models are shown in Figures 2 and S1. The power curves first increase from \(S=1\) to \(S=3\). For \(3< S \le M/2\), there is little change in power. When \(S > M/2\), the power curves decrease to a minimum at \(S=M1\). The optimal number of pretreatment visits \(S_{\text {opt}}\) are highlighted in red, showing that \(S_{\text {opt}}\) are less than or equal to \(M/2=5\). Hence, when \(M = 10\), repeating pretreatment measurements with \(2< S \le 5\) would provide the optimal power. The optimal prepost allocations in GEE logistic regressions have similar conclusions as the linear models, that is, \(S_{\text {opt}} < M/2\) when \(\rho _X=\rho _Y\). Measurements taken after the randomization can be more informative since we treat the pretreatment measurements as covariates.
Overall, the results for the multiple binary outcomes with GEE logistic regression are similar to those for the continuous outcomes with the ANCOVA model. The proposed method extends well to the nonlinear models through Monte Carlo simulation studies. The closedform formulas for sample size, power, and \(S_{\text {opt}}\) calculations under nonlinear models require future investigations.
Discussion
In this article, we demonstrate the merits of having multiple pretreatment measurements for both continuous and discrete responses in prepost designs. We consider the sample size calculation for the ANCOVA model when the pretreatment measures are included as covariates under a general correlation structure. Then we propose an optimal design under a specific constraint that the total number of pretreatment and posttreatment visits is fixed. Simulation studies were conducted for binary outcomes, suggesting that the insights from the linear model extend well to GEE logistic regression.
The prior information on the correlation structure is required to determine sample size and the optimal prepost allocation. Designers can obtain the prior information of correlation structure based on some examples of clinical trials (e.g., Table III in [5]). Besides, an adaptive design can be further considered to estimate those correlations during the interim analysis. One can start the design with prior information based on other examples of clinical trials. During the interim analysis, one can use Stage 1 data to estimate the correlation structure, then adapt the sample size formula and the prepost allocation for Stage 2.
Extensions of the ANCOVA model include the considerations of different time intervals between measurements and alternative correlation structures such as an autoregressive structure:
In clinical trial designs, the time intervals of pretreatment visits and posttreatment visits could be equally spaced. However, if the time interval between the visits increases, the correlation tends to decline [5]. When the time intervals between visits are not equally spaced, one can consider an autoregressive structure or a more general correlation structure that assumes the correlations between all pairs of measurements are different. We leave this as future work for more thorough investigations. Like many other statistical methods, the proposed ANCOVA model could also be extended to adjust for covariates other than the baseline measurement of the outcome and further improve precision [20]. Similar to the idea of measuring the pretreatment outcome multiple times, collecting other covariates multiple times may help further improve the framework. However, one needs to carefully address the potential correlation between the key covariate in ANCOVA (e.g. average baseline scores) and other covariates. Another possible extension is in observational studies. Though our method is proposed under the framework of classic clinical trials, it shares some similarities with the DifferenceinDifference (DID) technique, which is a quasiexperimental design applied in observational settings where exchangeability cannot be assumed between the treatment and control groups. Though DID is a technique to remove biases in the postintervention period after data collection, how to adapt our method to this scenario and obtain the optimal prepost allocation before the data collection could be a future research topic.
There are still remaining questions to be discussed. Several authors, including Liang and Zeger [15] and Tango [10], have recommended analyzing the pretreatment measurements as additional outcomes through mixed effect models rather than treating them as covariates. Comparison between using a single baseline as a covariate or dependent variable were discussed by Liu et al. [21] and Wan [22]. It would be interesting to compare the repeating baselines sample size calculation between the ANCOVA model and the linear mixed effect model, then consider the optimal prepost allocation of linear and logistic mixed effect model for both continuous and binary outcomes. It is noteworthy that the ANCOVA model might be misspecified for the discrete outcomes. Extension to discrete responses with nonlinear models can be a future direction to deal with this issue. Regarding nonlinear models, it would be helpful to strengthen the theoretical analysis for logistic mixedeffect models by simulation studies or closedform formulations.
Another future direction is the threearm clinical trial, which includes an experimental treatment, an active reference treatment, and a placebo group [23,24,25]. Besides, one can further consider, given a constraint of the total cost, how to obtain the optimal choice of sample size and the number of pretreatment and posttreatment visits to maximize the power function. Generally speaking, if the costs of each prepost visit are high, one can tend to select a larger sample size. In contrast, if the expense of recruiting each patient is high, then we would expect to get a smaller sample size but repeat more pretreatment and posttreatment measurements.
Although using both screening and baseline can be more powerful than using a single baseline, sometimes there are ethical concerns about having multiple pretreatment visits in clinical trials. For trials and diseases that require treatment immediately after the baseline visit, it could be impractical and unethical to repeat the pretreatment measurements [5]. Finally, a potential benefit of repeating prepost measurements is to reduce the impact of missing values in the ANCOVA analysis, especially for missing baseline data. This also merits further discussion.
Conclusion
We address the advantages of using multiple pretreatment and posttreatment measurements in randomized clinical trials. For the ANCOVA model, the sample size formula under general correlation structures is considered, and we derive the optimal number of pre/post measurements given the total number of visits. Repetition of the followup measurements is generally more beneficial than repeating the baselines, but the latter can provide nonnegligible improvement of the efficiency in repeated measures designs. Simulation studies are conducted for binary measurements, which have similar conclusions as for the linear model.
Availability of data and materials
All R codes are available at https://doi.org/10.5281/zenodo.7594938 [26].
Abbreviations
 ANCOVA:

Analysis of covariance
 GEE:

Generalized estimating equations
References
Huntington Study Group. Tetrabenazine as antichorea therapy in Huntington disease. Neurology. 2006;66(3):366–72.
Huntington Study Group. Effect of deutetrabenazine on chorea among patients with Huntington disease: A randomized clinical trial. JAMA. 2016;316(1):40–50.
Van Patten CL, Olivotto IA, Chambers GK, Gelmon KA, Hislop TG, Templeton E, Wattie A, Prior JC. Effect of soy phytoestrogens on hot flashes in postmenopausal women with breast cancer: a randomized, controlled clinical trial. J Clin Oncol. 2002;20(6):1449–55.
Vickers AJ. How many repeated measures in repeated measures designs? Statistical issues for comparative trials. BMC Med Res Methodol. 2003;3:22.
Frison L, Pocock SJ. Repeated measures in clinical trials: Analysis using mean summary statistics and its implications for design. Stat Med. 1992;11(13):1685–704.
Bristol DR. The choice of two baselines. Drug Inf J. 2007;41(1):57–61.
Zhang P, Chen D, Roe T. Choice of Baselines in Clinical Trials: A Simulation Study from Statistical Power Perspective. Commun Stat Simul Comput. 2010;39(7):1305–17.
Design and Analysis of Clinical Experiments. New York: Wiley; 1986.
Liang K, Zeger S. Longitudinal data analysis of continuous and discrete responses for prepost designs. Sankhyā Indian J Stat B. 2000;62(1):134–48.
Tango T. On the repeated measures designs and sample sizes for randomized controlled trials. Biostatistics. 2016;17(2):334–49.
Everitt BS, Hothorn T. A Handbook of Statistical Analysis Using R. 2nd ed. Boca Raton: CRC Press; 2010.
Ma S. Methods for Improving Efficiency in Clinical Trials, Doctoral dissertation. Rochester: University of Rochester; 2019.
Yan J. Enjoy the joy of copulas: With a package copula. J Stat Softw. 2007;21(4):1–21.
Firth D. Bias reduction of maximum likelihood estimates. Biometrika. 1993;80(1):27–38.
Liang K, Zeger S. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):13–22.
Mancl LA, DeRouen TA. A covariance estimator for GEE with improved smallsample properties. Biometrics. 2001;57(1):126–34.
Wang M, Kong L, Li Z, Zhang L. Covariance estimators for generalized estimating equations (GEE) in longitudinal analysis with small samples. Stat Med. 2016;35(10):1706–21.
Gruber S, Tchetgen ET. Limitations of empirical calibration of pvalues using observational data. Stat Med. 2016;35(22):3869–82.
Cabras S, Castellanos ME. Pvalue calibration in multiple hypotheses testing. Stat Med. 2017;36(18):2875–86.
Lin W. Agnostic notes on regression adjustments to experimental data: Reexamining Freedman’s critique. Ann Appl Stat. 2013;7(1):295–318.
Liu GF, Lu K, Mogg R, Mallick M, Mehrotra DV. Should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials? Stat Med. 2009;28(20): 250930.
Wan F. Statistical analysis of two arm randomized prepost designs with one posttreatment measurement. BMC Med Res Methodol. 2021;21:150.
Tang NS, Yu B, Tang ML. Testing noninferiority of a new treatment in threearm clinical trials with binary endpoints. BMC Med Res Methodol. 2014;14:134.
Tang N, Yu B. Simultaneous confidence interval for assessing noninferiority with assay sensitivity in a threearm trial with binary endpoints. Pharm Stat. 2020;19(5):518–31.
Tang N, Yu B. Bayesian sample size determination in a threearm noninferiority trial with binary endpoints. J Biopharm Stat. 2022;32(5):768–88.
Ma S, Wang T. R codes of manuscript The optimal prepost allocation for randomized clinical trials. Zenodo. 2023. https://doi.org/10.5281/zenodo.7594938.
Acknowledgements
The computations in this paper were run on the Siyuan1 and \(\pi\) 2.0 clusters supported by the Center for High Performance Computing at Shanghai Jiao Tong University. We thank the editor and two anonymous reviewers for their helpful comments and suggestions.
Funding
This work was supported by the National Natural Science Foundation of China (grant 12101351), Shanghai Sailing Program (23YF1421000), the Fundamental Research Funds for the Central Universities (YG2023QNA01), and Clinical Research Plan of SHDC (SHDC2022CRW003).
Author information
Authors and Affiliations
Contributions
S.M. and T.W. developed the concepts for the manuscript and proposed the method. S.M. conducted the analyses. T.W. helped interpret the results. S.M. and T.W. prepared the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Ethics approval was not needed for this study.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing financial interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Ma, S., Wang, T. The optimal prepost allocation for randomized clinical trials. BMC Med Res Methodol 23, 72 (2023). https://doi.org/10.1186/s1287402301893w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1287402301893w