 Research
 Open Access
 Published:
Statistical analysis of two arm randomized prepost designs with one posttreatment measurement
BMC Medical Research Methodology volume 21, Article number: 150 (2021)
Abstract
Background
Randomized prepost designs, with outcomes measured at baseline and after treatment, have been commonly used to compare the clinical effectiveness of two competing treatments. There are vast, but often conflicting, amount of information in current literature about the best analytic methods for prepost designs. It is challenging for applied researchers to make an informed choice.
Methods
We discuss six methods commonly used in literature: one way analysis of variance (“ANOVA”), analysis of covariance main effect and interaction models on the posttreatment score (“ANCOVAI” and “ANCOVAII”), ANOVA on the change score between the baseline and posttreatment scores (“ANOVAChange”), repeated measures (“RM”) and constrained repeated measures (“cRM”) models on the baseline and posttreatment scores as joint outcomes. We review a number of study endpoints in randomized prepost designs and identify the mean difference in the posttreatment score as the common treatment effect that all six methods target. We delineate the underlying differences and connections between these competing methods in homogeneous and heterogeneous study populations.
Results
ANCOVA and cRM outperform other alternative methods because their treatment effect estimators have the smallest variances. cRM has comparable performance to ANCOVAI in the homogeneous scenario and to ANCOVAII in the heterogeneous scenario. In spite of that, ANCOVA has several advantages over cRM: i) the baseline score is adjusted as covariate because it is not an outcome by definition; ii) it is very convenient to incorporate other baseline variables and easy to handle complex heteroscedasticity patterns in a linear regression framework.
Conclusions
ANCOVA is a simple and the most efficient approach for analyzing prepost randomized designs.
Background
Two arm parallel randomized trials have been widely used to compare the clinical effectiveness of competing treatments in improving patients’ health outcomes. In these trials, continuous outcomes of interest were routinely measured at baseline (defined as “baseline score”) and one post treatment time point (defined as “posttreatment score”). The primary purpose of designing a prepost randomized study is to answer the scientific question of interest: is treatment A more effective than treatment B? To assess the difference in the treatment effectiveness between two treatments, we need to select a study endpoint and quantify a treatment effect. Common study endpoints include the post treatment score, the change score from baseline to post treatment, a percentage change from baseline, and rate of change from baseline. The difference between two arms on selected study endpoints is defined as the treatment effect. Few studies have investigated the links between these different metrics of treatment effect in a randomized prepost trial. These underlying connections are critical in understanding the equivalence among some statistical methods that may appear to be very different at the first sight. We need to be certain about the type of treatment effect each method targets and select the one that yields an unbiased and the most efficient estimator of the treatment effect of our interest.
There are a number of statistical methods commonly used in analyzing prepost trials. We can analyze the posttreatment score using one way analysis of variance model (ANOVA) [1, 2], analysis of covariance model adjusting for the baseline score (ANCOVAI) [2,3,4,5,6,7], and ANCOVA including a baseline score by treatment interaction (ANCOVAII) [3, 4, 8,9,10]. We can also analyze the change score using ANOVA (ANOVAChange) [11]. Alternatively, we can model the baseline and posttreatment scores jointly using repeated measures models (RM) and constrained repeated measures models (cRM) [10, 12,13,14]. Despite of the simplicity and wide application of randomized prepost designs, which method is the best analytic approach has been a debated topic and many methodological studies have been performed to compare different statistical methods for past decades [1,2,3,4,5,6,7,8,9,10,11,12,13]. However, it is challenging for applied researchers to evaluate this vast, but often conflicting, amount of information in current literature and make an informed choice.
In this study we aim to review ANOVA, ANCOVAI, ANCOVAII, ANOVAChange, RM,andcRM from a practical standpoint, with the focus on delineating the differences and underlying connections between them. In section Methods, we first provide notations and assumptions for a typical prepost design, define homogeneous and heterogeneous study populations, and discuss some common study endpoints and the associated metrics of treatment effects. We next analytically assess differences and connections between these competing models in the homogeneous and heterogeneous scenarios by first describing each model using the same set of population mean, variance, and covariance parameters. In section Results, we compare the relative efficiency of these competing methods theoretically using three simulated weight loss trial examples (homogeneous data, heterogeneous data with balanced design, heterogeneous data with unbalanced design). In the last two sections, we discuss the results and give recommendation on the best analytical approach in a randomized prepost design.
Methods
A hypothetical weight loss trial and metrics of treatment effects
Notations
In a hypothetical two arm parallel weight loss trial comparing the effect of a new drug (“treatment”) and a placebo (“control”) in reducing participants’ body weights, we use Y_{ijt} to denote body weight of the i th subject (i = 1, 2, 3, …n_{j}) in the jth treatment arm (j = 0, 1) at the t th time (t = t_{0}, t_{1} ). n_{0} and n_{1} are the number of subjects in the control and treatment arms.
We denote the mean baseline weights for the treatment and control arms by \( {\mu}_{1{t}_0} \) and \( {\mu}_{0{t}_0} \), respectively. Random allocation guarantees \( {\mu}_{1{t}_0}={\mu}_{0{t}_0} \) and we let \( {\mu}_{t_0} \) denote the overall mean baseline weight. The mean weights of the treatment and control arms at time t_{1} are denoted by \( {\mu}_{1{t}_1} \) and \( {\mu}_{0{t}_1} \), respectively (Fig. 1). We define homogeneous and heterogeneous study populations as follows:

i)
The homogeneous scenario: every participant has the same pattern of variance and covariance structure for their baseline and posttreatment weights, which is parameterized as below:
$$ \sum =\left[\begin{array}{c}{\sigma}_0^2\\ {}\rho {\sigma}_0{\sigma}_1\end{array}\right.\left.\begin{array}{c}\rho {\sigma}_0{\sigma}_1\\ {}{\sigma}_1^2\end{array}\right], $$
where \( {\sigma}_0^2 \) and \( {\sigma}_1^2 \) are the variances of the baseline and posttreatment weights, ρ is the correlation coefficient between the baseline and posttreatment weights.

ii)
The heterogeneous scenario: variance and covariance structures of the baseline and posttreatment weights differ between the treatment and control arms. Formally, we have
$$ {\sum}_0=\left[\begin{array}{c}{\sigma}_0^2\\ {}{\rho}_0{\sigma}_0{\sigma}_{01}\end{array}\right.\left.\begin{array}{c}{\rho}_0{\sigma}_0{\sigma}_{01}\\ {}{\sigma}_{01}^2\end{array}\right], $$
and
where \( {\sigma}_0^2 \) is the common variance of the baseline body weight in the control and treatment arms. Randomization guarantees that the variances of the baseline weights in both arms are equal to \( {\sigma}_0^2 \). \( {\sigma}_{01}^2 \) and \( {\sigma}_{11}^2 \) are the variances of the posttreatment weight in the control and treatment arms. ρ_{0} and ρ_{1} are the correlation coefficients between the baseline and posttreatment weights in the control and treatment arms, respectively. In practice, participants may respond to the treatment more differently so that variability of the posttreatment weight tends to be larger in the treatment arm than in the control arm and the correlation between pre and posttreatment weights are usually stronger in the control arm than in the treatment arm. i.e., ρ_{0} > ρ_{1} and \( {\sigma}_{11}^2>{\sigma}_{01}^2 \).
Metrics of treatment effect
We discuss the following three metrics of treatment effect commonly reported in prepost trials:

i)
The primary endpoint is the posttreatment weight measured at t_{1}. The difference in the mean posttreatment weights of two arms is defined as a treatment effect, which is parameterized as follows:
$$ \tau ={\mu}_{1{t}_1}{\mu}_{0{t}_1} $$
For example, if τ = − 10, we can interpret the results as “at the end of the trial, the mean weight was 10 pounds lower in the treatment group than in the control group.”

ii)
The primary endpoint is the change score calculated by subtracting the baseline weight from the posttreatment weight. i.e., \( {\Delta }_{ij}={Y}_{ij{t}_1}{Y}_{ij{t}_0} \). The difference in the mean change scores of two arms is a treatment effect. Formally, we have:
$$ \overset{\sim }{\tau }=\left({\mu}_{1{t}_1}{\mu}_{1{t}_0}\right)\left({\mu}_{0{t}_1}{\mu}_{0{t}_0}\right) $$
e.g. if \( \overset{\sim }{\tau }=10 \), this difference is usually interpreted as “weight reductions were 10 pounds greater in the treatment group than in the control group”. Since randomization ensures \( {\mu}_{0{t}_1}={\mu}_{0{t}_0} \), it follows directly \( \overset{\sim }{\tau }=\tau \). When we code “0” for t_{0} and “1” for t_{1}, the mean change score for each arm can also be interpreted as the mean change rate per unit time for each arm, represented by slopes in Fig. 1. Thus, the difference in slopes, denoted by \( \overset{\sim }{\overset{\sim }{\tau }}={\alpha}_1{\alpha}_0 \), is also equivalent to τ. As shown in previous section, ANOVA and ANCOVA target τ, ANOVACHANGE targets \( \overset{\sim }{\tau }, \) and RM targets \( \overset{\sim }{\overset{\sim }{\tau }}.\kern0.5em \) However, we can compare these statistical methods targeting seemingly very different types of treatment effects in a meaningful way because of the equivalence between τ, \( \overset{\sim }{\tau }, \) and \( \overset{\sim }{\overset{\sim }{\tau }} \) in randomized prepost designs.

iii)
The primary endpoint is the percent change from baseline weight, denoted by \( {\varphi}_{ij}=\frac{\left({Y}_{ij{t}_1}{Y}_{ij{t}_0}\right)}{Y_{ij{t}_0}} \). The mean difference in the percent change between two arms is defined as a treatment effect and parameterized as follows:
$$ {\tau}^{\ast }={\overline{\varphi}}_1{\overline{\varphi}}_0, $$
where \( {\overline{\varphi}}_1 \) and \( {\overline{\varphi}}_0 \) are the mean percent changes of the treatment and control arms. Although the percent change is popular among clinical researchers, this metric has several drawbacks [1, 15, 16]: i) the percent change is a function of ratio \( \frac{Y_{ij{t}_1}}{Y_{ij{t}_0}} \) . The distribution of the percent change is highly skewed. Analyzing it with normaltheory based statistical methods is not justified and nonparametric statistical methods are generally less powerful; ii) the percent change is not a symmetric measure. For example, the mean weight of adults over 20 in US is 197.8 pound for men and 170.5 pound for women. The mean difference is 27.3 pound between men and women. Men weight 16% (i.e.,100 × ((197.8–170.5)/170.5)) more than women, whereas women weight 13.8% (i.e., 100 × ((197.8–170.5)/197.8)) less than men. The differences could be different depending on which sex is used as devisor; iii) the percent change is not an additive measure. For example, if a participant’s weight increases by 10% in first 6 months and fall by 10% for the next 6 months, the 2 % changes do not cancel out. The participant’s weight at the end would be only 99% of the participant’s starting weight.
Statistical models
In this section, we focus on six methods that estimate τ. We describe each statistical model using the same set of population mean, variance, and covariance parameters defined in section Methods for homogeneous and heterogeneous scenarios, separately. For each method, we present the closedform expressions of the point estimator of treatment effect and its variance. It often goes unnoticed in practice that different statistical methods have different types of variances (i.e., conditional vs. unconditional variances) associated with their treatment effect estimators. For example, the OLS modelbased variances for ANCOVA are conditional because OLS assumes the baseline weight is fixed. Generally speaking, the baseline weight is random because we rarely enroll participants into randomized trials based on predetermined values of the baseline weight. Thus, the unconditional variance and the corresponding unconditional inference is of greater interest because we want the findings derived from the current sample to be generalizable to the population of interest. We will discuss in details whether the OLS modelbased conditional inference (i.e., test statistics and pvalues from standard statistical softwares) for ANCOVA is still valid for unconditional hypothesis testing and the potential fixes that we can use to draw valid unconditional inference if the usual OLS modelbased inference is biased.
When the study population is homogeneous
Method 1:ANOVA modeling post treatment measure (“ANOVAPost”). We model the posttreatment body weight \( {Y}_{ij{t}_1} \) using the binary treatment indicator G_{ij} (1 if in the treatment arm; 0 if in the control arm) as follows:
where \( {\beta}_0^{(1)}={\mu}_{0{t}_1}, \)\( {\beta}_1^{(1)}={\mu}_{1{t}_1}{\mu}_{0{t}_1}= \)τ, and \( {e}_{ij}^{(1)} \) is independently and identically distributed (i.i.d) random error. \( {\beta}_1^{(1)} \) represents the treatment effect. Model (1) is homoscedastic with a constant residual variance \( {\sigma}_1^2 \).
We can fit an ordinary least squares (OLS) regression to estimate the coefficients and standard errors of model (1). The closedform expressions of the OLS estimator \( {\hat{\beta}}_{1, ols}^{(1)} \) and its “unconditional” variance, denoted by \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(1)}\right) \), are presented in Table 1. \( {\hat{\beta}}_{1, ols}^{(1)} \) is estimated by the sample group mean difference in the posttreatment weight between two arms. \( {\hat{\beta}}_{1, ols}^{(1)} \) is unbiased for τ. The OLS modelbased variance of \( {\hat{\beta}}_{1, ols}^{(1)} \) assuming known \( {\sigma}_1^2 \) is:
where \( {G}_{..}=\frac{\sum_{j=0}^1{\sum}_{i=1}^{n_j}{G}_{ij}}{n_0+{n}_1}=\frac{n_1}{n_0+{n}_1} \). \( {\sigma}_1^2 \) is estimated by
where \( {\hat{y}}_{ij{t}_1}^{(1)}={\hat{\beta}}_{0, ols}^{(1)}+{\hat{\beta}}_{1, ols}^{(1)}{G}_{ij} \) is the predicted value from model (1). We let \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(1)}\right) \) denote the OLS modelbased variance estimator with \( {\hat{\sigma}}_1^2 \) substituted for \( {\sigma}_1^2 \), which is output by standard statistical softwares (Table 1). Since \( {\sum}_{j=0}^1{\sum}_{i=1}^{n_j}{\left({G}_{ij}{G}_{..}\right)}^2=\frac{n_0{n}_1}{n_0+{n}_1} \), it follows that \( {\mathit{\operatorname{var}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(1)}\right)=\mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(1)}\right) \). It is well established that \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(1)}\right) \) is unbiased for \( {\mathit{\operatorname{var}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(1)}\right) \). Thus, \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(1)}\right) \) is unbiased for \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(1)}\right) \). The usual OLS modelbased inference (i.e., test statistics \( t=\frac{{\hat{\beta}}_{1, ols}^{(1)}}{\sqrt{{\hat{va\mathrm{r}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(1)}\right)\ }} \) and the associated pvalue) is valid for testing H_{o} : τ = 0 unconditionally.
Method 2:ANCOVA modeling post treatment measure (“ANCOVAI”): We model the posttreatment weight \( {Y}_{ij{t}_1} \) using the binary treatment indicator G_{ij} and the baseline weight \( {Y}_{ij{t}_0} \).
, where \( {\beta}_0^{(2)}={\mu}_{0{t}_1}\rho \frac{\sigma_1}{\sigma_0}{\mu}_{t_0} \), \( {\beta}_1^{(2)} \) = \( \tau, \kern0.5em {\beta}_2^{(2)} \) = \( \rho \frac{\sigma_1}{\sigma_0} \), and \( {e}_{ij}^{(2)} \) is i.i.d random error. \( {\beta}_1^{(2)} \) measures the treatment effect τ and \( {\beta}_2^{(2)} \) represents the slope of the prepost association between \( {Y}_{ij{t}_1} \) and \( {Y}_{ij{t}_0} \). Model (2) has a common residual variance \( {\sigma}_{\epsilon^{(2)}}^2 \) and implicitly assumes that two arms share the common baseline mean \( {\mu}_{t_0} \).
The coefficients and standard errors of model (2) are also estimated using an OLS regression. The OLS estimator \( {\hat{\beta}}_{1, ols}^{(2)} \) is derived as the sample mean difference in the posttreatment weight adjusting for the sample mean difference in the baseline weight between two arms. The group mean difference in the baseline weight can be seen as chance imbalance in a randomized trial. \( {\hat{\beta}}_{1, ols}^{(2)} \) is unbiased for τ both conditional on \( {Y}_{ij{t}_0} \) and unconditionally. The formulas of \( {\hat{\beta}}_{1, ols}^{(2)} \) and its “unconditional” variance \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(2)}\right) \) are listed in Table 1. However, OLS assumes that the baseline weight \( {Y}_{ij{t}_0} \) is fixed. OLS targets the conditional variance of \( {\hat{\beta}}_{1, ols}^{(2)} \), denoted by \( \mathit{\operatorname{var}}\Big({\hat{\beta}}_{1, ols}^{(2)}\left{Y}_{ij{t}_0}\right) \), instead of \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(2)}\right) \). The formula of \( \mathit{\operatorname{var}}\Big({\hat{\beta}}_{1, ols}^{(2)}\left{Y}_{ij{t}_0}\right) \) with a known common residual variance \( {\sigma}_{\epsilon^{(2)}}^2 \) is presented in Table 1. Since \( {\sigma}_{\epsilon^{(2)}}^2 \) is generally unknown, it is estimated by the following sample residual variance:
, where \( {\hat{y}}_{ij{t}_1}^{(2)}={\hat{\beta}}_{0, ols}^{(2)}+{\hat{\beta}}_{1, ols}^{(2)}{G}_{ij}+{\hat{\beta}}_{2, ols}^{(2)}{Y}_{ij{t}_0} \), the predicted value from model (2). We let \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(2)}{Y}_{ij{t}_0}\right) \) denote the OLS modelbased variance estimator with \( {\hat{\sigma}}_{\epsilon^{(2)}}^2 \) substituted for \( {\sigma}_{\epsilon^{(2)}}^2 \) . Note that \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(2)}{Y}_{ij{t}_0}\right) \) is reported by standard statistical softwares (e.g. “proc reg” in SAS). Its formula is presented in Table 1.
Since we want to generalize our conclusions to a general population and \( {Y}_{ij{t}_0} \) can take different values from those collected in the current sample, we may wonder whether significance tests based on the modelbased conditional variance assuming \( {Y}_{ij{t}_0} \) is fixed (e.g., \( t=\frac{{\hat{\beta}}_{1, ols}^{(2)}}{\sqrt{{\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(2)}{Y}_{ij{t}_0}\right)\kern0.5em }} \)) is comparable to unconditional inference (e.g., \( t=\frac{{\hat{\beta}}_{1, ols}^{(2)}}{\sqrt{\mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(2)}\right)}} \)), in which \( {Y}_{ij{t}_0} \) is treated as random variable, for testing H_{o} : τ = 0. To establish this equivalence, we need to show: i) \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(2)}{Y}_{ij{t}_0}\right) \) is unbiased for \( \mathit{\operatorname{var}}\Big({\hat{\beta}}_{1, ols}^{(2)}\left{Y}_{ij{t}_0}\right) \); ii) \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(2)}{Y}_{ij{t}_0}\right) \) is unbiased for \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(2)}\right) \). The first part is well established in a homoscedastic linear model. The second part holds because we can show that \( \kern0.50em \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(2)}\right) \) =E(\( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(2)}{Y}_{ij{t}_0}\right) \)) using the law of total variance formula and the fact that \( {\hat{\beta}}_{1, ols}^{(2)} \) is unbiased for τ. That is, the unconditional variance of \( {\hat{\beta}}_{1, ols}^{(2)} \) is the average of its conditional variance over the distribution of the baseline weight. Therefore, the usual modelbased standard errors and associated pvalues are valid for unconditional inference [3, 5, 17].
Method 3:Repeated measures model (“RM”):RM models the baseline and posttreatment weights (\( {Y}_{ij{t}_0} \), \( {Y}_{ij{t}_1} \)) jointly using the binary treatment indicator G_{ij}, the binary time factor T_{ij}, the time by treatment interaction G_{ij} × T_{ij} as follows:
When t_{0} = 0 and t_{1} = 1, \( \kern0.50em {\gamma}_0^{(3)}= \)\( {\mu}_{0{t}_0} \), \( {\gamma}_1^{(3)}={\mu}_{1{t}_0}{\mu}_{0{t}_0}, \)\( {\gamma}_2^{(3)}={\mu}_{0{t}_1}{\mu}_{0{t}_0}, \) and \( {\gamma}_3^{(3)}=\left({\mu}_{1{t}_1}{\mu}_{1{t}_0}\right)\left({\mu}_{0{t}_1}{\mu}_{0{t}_0}\right) \). \( {\gamma}_0^{(3)} \) represents the mean baseline weight of the control arm, \( {\gamma}_1^{(3)} \) represents the difference in the mean baseline weights of the treatment and control arms, \( {\gamma}_2^{(3)} \) represents the mean change from baseline in the control arm, and \( {\gamma}_3^{(3)} \) is generally interpreted as the difference in the mean change from baseline in a unit time interval between the treatment and control arms (“difference in difference”), also known as the difference in slopes. We have \( {\mu}_{1{t}_0}={\mu}_{0{t}_0} \) from random allocation and it follows that \( {\gamma}_1^{(3)}=0 \) and \( {\gamma}_3^{(3)}={\mu}_{1{t}_1}{\mu}_{1{t}_1}=\tau . \) Thus, testing \( {H}_o:{\gamma}_3^{(3)}=0 \) is equivalent to testing H_{o} : τ = 0.
The generalized least squares (GLS) model with correlated outcomes is routinely used to estimate the coefficients and standard errors of model (3). The GLS estimator of the treatment effect \( {\hat{\gamma}}_{3,\kern0.5em gls}^{(3)} \) and its variance \( \mathit{\operatorname{var}}\Big({\hat{\gamma}}_{3,\kern0.5em gls}^{(3)} \)) given known variance and covariance parameters are presented in Table 1. \( {\hat{\gamma}}_{3,\kern0.5em gls}^{(3)} \) is estimated by the sample mean difference in body weight change between two arms and is unbiased for τ in a large sample. The variance and covariance parameters are generally unknown and need to be estimated using the restricted maximum likelihood (REML). The conventional maximal likelihood estimation (MLE) should be avoided. The REML variance estimator \( {\hat{\ \mathit{\operatorname{var}}}}_{reml}\Big({\hat{\gamma}}_{3,\kern0.5em gls}^{(3)} \)) is derived by plugging the REML estimators of the variance and covariance parameters (i.e., \( {\sigma}_0^2,{\sigma}_1^2,\rho {\sigma}_0{\sigma}_1 \)) into the formula of \( \mathit{\operatorname{var}}\left({\hat{\gamma}}_{3,\kern0.5em gls}^{(3)}\right) \).We use Kenward and Roger method [18](“ddfm = kenwardroger” in SAS proc. mixed procedure) to adjust for the potential finite sample bias in \( {\hat{\ \mathit{\operatorname{var}}}}_{reml}\Big({\hat{\gamma}}_{3,\kern0.5em gls}^{(3)} \)) because of its failure to incorporate variabilities of the REML estimators of the variance and covariance parameters. This adjustment involves inflating the variance and covariance matrix and computing an adjusted approximation degrees of freedom.
Method 4:Constrained Repeated measures Model (“cRM”): By specifying \( {\gamma}_1^{(3)} \) in the model, RM model (3) assumes the mean baseline weight is different between two arms. Liang and Zeger [8] proposed the following cRM model by fixing \( {\gamma}_1^{(3)}=0 \) to force the treatment and control arms to have the same intercept. Intuitively, cRM is more efficient than RM because cRM estimates one less parameter. Formally, we model the baseline and posttreatment weights (\( {Y}_{ij{t}_0} \), \( {Y}_{ij{t}_1} \)) jointly using the binary factor T_{ij}, a time by treatment interaction G_{ij} × T_{ij} in the following cRM model:
where \( {\gamma}_0^{(4)}={\mu}_{t_0},{\gamma}_2^{(4)}={\mu}_{0{t}_1}{\mu}_{0{t}_0} \), and \( {\gamma}_3^{(4)}=\tau \). Interpretations of \( {\gamma}_0^{(4)} \), \( {\gamma}_2^{(4)} \), and \( {\gamma}_3^{(4)} \) are the same as their counterparts in RM. The formulas of the GLS point estimator \( {\hat{\gamma}}_{3,\kern0.5em gls}^{(4)} \) and its variance \( \mathit{\operatorname{var}}\left({\hat{\gamma}}_{3,\kern0.5em gls}^{(4)}\right) \) are listed in Table 1. \( {\hat{\gamma}}_{3,\kern0.5em gls}^{(4)} \) is unbiased for τ asymptotically. The empirical or the modelbased variance estimate for \( \mathit{\operatorname{var}}\left({\hat{\gamma}}_{3,\kern0.5em gls}^{(4)}\right) \) is derived using REML in the same way as a regular RM model.
Method 5:ANOVA with change score (“ANOVAChange”): We model change score \( {\Delta }_{ij}={Y}_{ij{t}_1}{Y}_{ij{t}_0} \) using the binary treatment indicator G_{ij} as follows:
where \( {\beta}_0^{(5)}={\mu}_{0{t}_1}{\mu}_{0{t}_0} \), \( {\beta}_1^{(5)}=\left({\mu}_{1{t}_1}{\mu}_{1{t}_0}\right)\left({\mu}_{0{t}_1}{\mu}_{0{t}_0}\right) \), and \( {e}_{ij}^{(3)} \) is i.i.d random error. \( {\beta}_0^{(5)} \) measures the mean difference score in the control arm. \( {\beta}_1^{(5)} \) measures the treatment effect \( \overset{\sim }{\tau } \). Since \( {\mu}_{1{t}_0}={\mu}_{0{t}_0} \) due to randomization at baseline, \( {\beta}_1^{(5)} \) is reduced to τ. The closedform expressions of \( {\hat{\beta}}_{1, ols}^{(5)} \) and \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(5)}\right) \) are listed in Table 1. \( {\hat{\beta}}_{1, ols}^{(5)} \) is derived as the sample mean difference in the change score between two arms (“difference in difference”) and is unbiased for τ. The OLS modelbased variance of \( {\hat{\beta}}_{1, ols}^{(5)} \) assuming known \( {\sigma}_{\epsilon^{(5)}}^2 \) is
where \( {G}_{..}=\frac{\sum_{j=0}^1{\sum}_{i=1}^{n_j}{G}_{ij}}{n_0+{n}_1}=\frac{n_1}{n_0+{n}_1} \). \( {\sigma}_{\epsilon^{(5)}}^2 \) is estimated by
where \( {\hat{\Delta }}_{ij}^{(5)} \) is the fitted value from model (5). We let \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(5)}\right) \) denote the OLS modelbased variance estimator with \( {\hat{\sigma}}_{\epsilon^{(5)}}^2 \) substituted for \( {\sigma}_{\epsilon^{(5)}}^2 \) Table 1, which is reported by standard statistical softwares. Since \( {\sum}_{j=0}^1{\sum}_{i=1}^{n_j}{\left({G}_{ij}{G}_{..}\right)}^2=\frac{n_0{n}_1}{n_0+{n}_1} \), it follows that \( {\mathit{\operatorname{var}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(5)}\right)=\mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(5)}\right) \). It is well established that \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(5)}\right) \) is unbiased for \( {\mathit{\operatorname{var}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(5)}\right) \), and thus for \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(5)}\right) \). The usual OLS modelbased inference is valid for unconditional hypothesis testing.
When the study population is heterogeneous
Method 6:ANCOVAII: Different variance and covariance structures in the treatment and control arms suggest a baseline measurement by treatment interaction term in ANCOVA [2, 3, 9, 10]. To estimate τ using an interaction model, we first compute the mean centered baseline weight \( {\overset{\sim }{Y}}_{ij{t}_0} \) by subtracting the overall mean baseline weight from individual baseline weights. i.e., \( {\overset{\sim }{Y}}_{ij{t}_0}={Y}_{ij{t}_0}{\mu}_{t_0} \). We then model the posttreatment body weight \( {Y}_{ij{t}_1} \) using the binary treatment indicator G_{ij}, the mean centered baseline weight \( {\overset{\sim }{Y}}_{ij{t}_0} \), and the baseline weight by treatment interaction \( {G}_{ij}\times {\overset{\sim }{Y}}_{ij{t}_0} \) as follows:
, where \( {\beta}_0^{(6)}={\mu}_{0{t}_1} \), \( {\beta}_1^{(6)}=\tau, \)\( {\beta}_2^{(6)}={\rho}_0\frac{\sigma_{0{t}_0}}{\sigma_0} \), and \( {\beta}_3^{(6)}= \)\( {\rho}_1\frac{\sigma_{1{t}_1}}{\sigma_0}{\rho}_0\frac{\sigma_{0{t}_0}}{\sigma_0} \). \( {e}_{i0}^{(6)} \) and \( {e}_{i1}^{(6)} \) are i.i.d random errors in the control and treatment arms. \( {\beta}_1^{(6)} \) measures the treatment effect. \( {\beta}_2^{(6)} \) is the regression slope of the baseline body weight in the control arm. \( {\beta}_3^{(6)} \) measures the difference in the regression slopes of the baseline weight between the treatment and control arms. Model (6) is heteroscedastic because the error terms in the treatment and control arms have different residual variances.
As presented in Table 2, the OLS estimator \( {\hat{\beta}}_{1, ols}^{(6)} \) is the adjusted mean difference in the posttreatment body weights controlling for a weighted mean difference of the baseline body weights between two arms with unequal weighting coefficients for treatment and control arms (i.e., \( {\hat{\beta}}_{2, ols}^{(6)}+{\hat{\beta}}_{3, ols}^{(6)} \) for the treatment group, and \( {\hat{\beta}}_{2, ols}^{(6)} \) for the control group). \( {\hat{\beta}}_{1, ols}^{(6)} \) is unbiased for τ. The conditional variance of \( {\hat{\beta}}_{1, ols}^{(6)} \), denoted by \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \), incorporates two different residual variances \( {\sigma}_{\epsilon_0^{(6)}}^2 \) and \( \kern0.5em {\sigma}_{\epsilon_1^{(6)}}^2 \) (Table 2). Standard statistical softwares such as SAS do not output \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \) because OLS incorrectly assumes a common residual variance \( {\sigma}_{\epsilon^{(6)}}^2 \), which is the following weighted average of \( \kern0.5em {\sigma}_{\epsilon_0^{(6)}}^2 \) and \( \kern0.5em {\sigma}_{\epsilon_1^{(6)}}^2 \):
We let \( {\mathit{\operatorname{var}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \) denote the OLS modelbased conditional variance of \( {\hat{\beta}}_{1, ols}^{(6)} \) incorporating \( {\sigma}_{\epsilon^{(6)}}^2 \) (Table 2). Since \( {\sigma}_{\epsilon^{(6)}}^2 \) is generally unknown, \( {\sigma}_{\epsilon^{(6)}}^2 \) is estimated by
where \( {\hat{y}}_{ij{t}_1} \) is the predicted value of \( {y}_{ij{t}_1} \). We let \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \) denote the OLS modelbased variance estimator of \( {\hat{\beta}}_{1, ols}^{(6)} \) with \( {\hat{\sigma}}_{\epsilon^{(6)}}^2 \) substituted for \( {\sigma}_{\epsilon^{(6)}}^2 \). and known constant \( {\mu}_{t_0} \) (Table 2). \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \) is reported by standard statistical softwares (e.g., “proc reg” in SAS). To assess the validity of the modelbased standard errors and pvalues from a regular ANCOVAII model for unconditional inference, we need to examine: i) whether \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \) is unbiased for \( \mathit{\operatorname{var}}\Big({\hat{\beta}}_{1, ols}^{(6)}\left{\overset{\sim }{Y}}_{ij{t}_0}\right) \); ii) whether \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \) is unbiased for \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(6)}\right) \).
First, \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \) is unbiased for \( {\mathit{\operatorname{var}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right). \) However, the unbiasedness of \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \) as an estimator of \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \) depends on the relationship between \( {\mathit{\operatorname{var}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \) and \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \). Asymptotically, we have
It can be shown in a balanced design (n_{0} = n_{1}),
Thus, \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \) is nearly unbiased for \( \mathit{\operatorname{var}}\Big({\hat{\beta}}_{1, ols}^{(6)}\left{\overset{\sim }{Y}}_{ij{t}_0}\right)\ \left[3\right]. \) When the design is unbalanced (n_{0} ≠ n_{1}),
Hence, \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \) is biased for \( \mathit{\operatorname{var}}\Big({\hat{\beta}}_{1, ols}^{(6)}\left{\overset{\sim }{Y}}_{ij{t}_0}\right). \) Due to heteroscedasticity, \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \) overestimates \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \) if the group with a larger residual variance has larger sample size and the group with a smaller residual variance has smaller sample size, and otherwise may underestimate \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \) [3, 4].
Second, the common mean baseline weight \( {\mu}_{t_0} \) is generally unknown. We need to estimate \( {\mu}_{t_0} \) in \( {\overset{\sim }{Y}}_{ij{t}_0} \) using the overall sample mean \( {\hat{\mu}}_{t_0}=\frac{\sum_{j=0}^1{\sum}_{i=1}^{n_j}{Y}_{ij{t}_0}}{n_0+{n}_1} \) but ANCOVA treats \( {\hat{\mu}}_{t_0} \) as fixed and fails to capture this additional variability in the conditional variances. As shown below, it turns out that \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \) underestimates \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(6)}\right) \) by a factor of \( {\beta}_{3, ols}^{\left[6\right]2}\mathit{\operatorname{var}}\left({\hat{\mu}}_{t_0}\right) \) [3]:
Thus, the OLS modelbased conditional inference is biased for unconditional hypothesis testing because of heteroscedasticity and neglecting of sampling variability in \( {\hat{\mu}}_{t_0} \). To fix these two problems, we can use the following adjusted heteroscedasticityconsistent (HC) variance estimator to replace \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \) for valid unconditional inference:
where \( {\hat{\mathit{\operatorname{var}}}}_{HC}\Big({\hat{\beta}}_{1, ols}^{(6)}\left{\overset{\sim }{Y}}_{ij{t}_0}\right) \) is a HC variance estimator for \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \) [19] and can be output from standard softwares. HC variance estimators are consistent (i.e., unbiased in large sample). Among all available HC variance estimators, HC2 was shown to have the best performance in finite samples [3, 4] (e.g. “HCCMETHOD = 2” in proc. reg or “EMPIRICAL” in proc. mixed, SAS). \( {\hat{\beta}}_{3, ols}^{\left[6\right]} \) is the OLS estimator of \( {\beta}_3^{(6)} \), and \( {\hat{\sigma}}_0^2 \) is the overall sample variance of the baseline body weight. It follows directly that \( {\hat{\mathit{\operatorname{var}}}}_{aHC}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \) is asymptotically unbiased for \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(6)}\right) \) and we can construct a valid test \( t=\frac{{\hat{\beta}}_{1, ols}^{(6)}}{\sqrt{{\hat{\mathit{\operatorname{var}}}}_{aHC}\left({\hat{\beta}}_{1, ols}^{(6)}{\overset{\sim }{Y}}_{ij{t}_0}\right)}\ } \) for testing H_{o} : τ = 0 unconditionally.
Method 7ANCOVAI: We model the posttreatment weight \( {Y}_{ij{t}_1} \) using the binary treatment G and the baseline weight \( {Y}_{ij{t}_0} \):
, where \( {\beta}_0^{(7)}={\beta}_0^{(6)}{\beta}_3^{(6)}{p}_0{\mu}_0, \) and \( {\beta}_1^{(7)}=\tau \). \( {e}_{i0}^{(7)} \) and \( {e}_{i1}^{(7)} \) are random errors in the control and treatment arms. Since \( {e}_{i0}^{(7)} \) and \( {e}_{i1}^{(7)} \) have different variances in general, model (7) is heteroscedastic and the severity of heteroscedasticity is determined by the correlation coefficient, the variances of the posttreatment weights in two arms, and whether the design is balanced.
As shown in Table 2, the OLS estimator \( {\hat{\beta}}_{1, ols}^{(7)} \) is an adjusted mean difference in the posttreatment weights controlling for a weighted mean difference of the baseline weights between two arms with equal weighting coefficient for the treatment and control arms (i.e., \( {\hat{\beta}}_{2, ols}^{(7)} \) for both arms). \( {\hat{\beta}}_{1, ols}^{(7)} \) is unbiased for τ. The true conditional variance \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \) incorporates two different residual variances. Similar to ANCOVAII, the OLS modelbased inference for ANCOVAI also mistakenly assumes a constant residual variance \( {\sigma}_{\epsilon^{(7)}}^2 \), which is a weighted average of \( \kern0.5em {\sigma}_{\epsilon_0^{(7)}}^2 \) and \( \kern0.5em {\sigma}_{\epsilon_1^{(7)}}^2 \), as follows:
Since \( {\sigma}_{\epsilon^{(7)}}^2 \) is unknown, it is estimated by
where \( {\hat{y}}_{ij{t}_1} \) is the predicted value of \( {y}_{ij{t}_1} \) from model (7). The closed form expressions of the OLS modelbased conditional variance \( {\mathit{\operatorname{var}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \) incorporating \( {\sigma}_{\epsilon^{(7)}}^2 \) and the OLS modelbased variance estimator \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \) with \( {\hat{\sigma}}_{\epsilon^{(7)}}^2 \) substituted for \( {\sigma}_{\epsilon^{(7)}}^2 \) are given in Table 2. Recall that standard statistical softwares report \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \). To show the modelbased standard errors and pvalues are valid for unconditional inference, we need to examine: i) whether \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \) is unbiased for \( \mathit{\operatorname{var}}\Big({\hat{\beta}}_{1, ols}^{(7)}\left{Y}_{ij{t}_0}\right) \); ii) whether \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \) is unbiased for \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(7)}\right) \).
First, \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \) is unbiased for \( {\mathit{\operatorname{var}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \) but the unbiasedness of \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \) as an estimator of \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \) depends on the relationship between \( {\mathit{\operatorname{var}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \) and \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \). Asymptotically, we have
When sample sizes are equal between two arms, we have
Thus, \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \) is nearly unbiased for \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \) in a balanced design [3]. When sample sizes are not equal between two arms,
it follows directly that \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \) is biased for \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \) due to heteroscedasticity. \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \) may overestimate \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \) when the group with a larger residual variance has larger sample size and the group with a smaller residual variance has smaller sample size, and otherwise may underestimate \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(7)}{\overset{\sim }{Y}}_{ij{t}_0}\right) \) [3, 4] . ANCOVAI is robust against heteroscedasticity in a balanced design, but not in an unbalanced design.
Second, different from ANCOVAII, \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \) is unbiased for \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(7)}\right) \) because \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(7)}\right)=E\left(\mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right)\right) \).
Thus, the modelbased standard errors and pvalues are valid for unconditional inference in a balanced design but are biased in an unbalanced design only due to heteroscedasticity. This bias can be easily corrected by replacing \( {\hat{\mathit{\operatorname{var}}}}_{ols}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \) with an HC variance estimator \( {\hat{\mathit{\operatorname{var}}}}_{HC}\left({\hat{\beta}}_{1, ols}^{(7)}{Y}_{ij{t}_0}\right) \)[4, 19] and corrected ANCOVAI will provide valid unconditional inference.
Constrained Repeated Measures heterogeneous variance model (“cRM”): We model the baseline and posttreatment weights (\( {Y}_{ij{t}_0}, \)\( {Y}_{ij{t}_1} \)) jointly using the binary time point T_{ij}, time by treatment interaction G_{ij} × T_{ij}:
where \( {\gamma}_0^{(8)}={\mu}_{t_0},{\gamma}_2^{(8)}={\mu}_{0{t}_1}{\mu}_{0{t}_0} \), and \( {\gamma}_2^{(8)}=\tau \). Noting that subjects in the treatment and control arms have different variancecovariance structures for the association between the pre and posttreatment weights, we fit a cRM heterogeneous variance GLS model with group specific variancecovariance structure (“repeated/group=” in SAS proc. mixed procedure specifies distinct variancecovariance structure for each treatment arm). The formulas of \( {\hat{\gamma}}_{2, gls}^{(8)} \) and \( \mathit{\operatorname{var}}\left({\hat{\gamma}}_{2, gls}^{(8)}\right) \)are listed in Table 2. The GLS estimator \( {\hat{\gamma}}_{2, gls}^{(8)} \)is asymptotically unbiased for\( {\gamma}_2^{(8)} \). REML is used to derive the empirical or modelbased variance estimator\( {\hat{\ \mathit{\operatorname{var}}}}_{reml}\Big({\hat{\gamma}}_{2,\kern0.5em gls}^{(8)} \)).
Results
All treatment effect estimators, except the ANOVA estimator, are expressed as the mean difference in posttreatment measurements adjusting for the chance imbalance in baseline measurement between two arms in certain ways. Nonetheless, all estimators are unbiased for τ. To compare these competing methods, we evaluate the efficiency of point estimators of treatment effect by comparing their “unconditional” variances. Since the hypothesis testing of no treatment effect is based on dividing the point estimator by its standard error (i.e., variance divided by sample size) and rejecting the null hypothesis when this ratio exceeds a given threshold, the method that produces unbiased point estimate with the smallest unconditional variance is preferred because standard error in the dominator of statistical test determines the statistical power.
When study population is homogeneous
ANCOVAI is a more efficient alternative to ANOVA because \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(2)}\right)\le \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(1)}\right) \) (Table 1). This advantage of ANCOVA over ANOVA can also be observed from the fact that the residual error variance of ANCOVAI is less than the residual error variance of ANOVA (i.e.,\( \left(1{\rho}^2\right){\sigma}_1^2 \)\( \le {\sigma}_1^2 \)). When the correlation coefficient ρ becomes larger, the ANCOVAI estimator has smaller variance. Since \( {Y}_{ij{t}_1} \)and \( {Y}_{ij{t}_0} \)are highly correlated in general, the inclusion of \( {Y}_{ij{t}_0} \) in ANCOVAI explains away some variability in\( {Y}_{ij{t}_1} \) and thus reduces the residual variance and yields a more efficient estimator of treatment effect than ANOVA.
ANOVAChange and RM have exactly same point estimators of τ and thus have the same variances (Table 1). To compare ANOVAChange or RM with ANOVA, we can derive the difference between the unconditional variances of their treatment effect estimators as follows:
When \( \rho <\frac{1}{2{\sigma}_1} \), ∆_{1} > 0 and ANOVA outperforms ANOVAChange and RM because the ANOVA estimator has smaller variance. When \( \rho >\frac{1}{2{\sigma}_1} \), ∆_{1} < 0 and ANOVA underperforms the other two methods.
It can be shown that the difference between the unconditional variances of the ANCOVAI or cRM estimators and those of theANOVAChange or RM estimators are always nonnegative:
Thus, ANOVAChange or RM is less efficient than either ANCOVAI or cRM because their estimators have larger variances. Intuitively ANCOVAI or cRM assumes that mean baseline weights in two arms are equal in a randomized study but ANOVAChange or RM assumes that there is a baseline difference and needs to estimate an extra parameter.
As shown in Table 1, the ANCOVAI and cRM estimators of τ are equivalent because \( {\beta}_{1, ols}^{(2)} \) = \( \frac{\rho {\sigma}_0{\sigma}_1}{\sigma_0^2} \). However, ANCOVAI plugs in the OLS estimators\( {\hat{\beta}}_{1, ols}^{(2)} \), whereas cRM plugs in the REML estimators of the variance and covariance parameters. The numerical difference between \( {\hat{\beta}}_{1, ols}^{(2)} \) and \( {\hat{\gamma}}_{3, gls}^{(4)} \) becomes negligible as sample size increases. Because of this equivalence between \( {\hat{\beta}}_{1, ols}^{(2)} \) and \( {\hat{\gamma}}_{3, gls}^{(4)} \),\( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(2)}\right) \) and \( \mathit{\operatorname{var}}\left({\hat{\gamma}}_{3,\kern0.5em gls}^{(4)}\right) \) are equal [3]. As discussed previously, ANCOVAI is a conditional model assuming fixed baseline covariates. Even though the modelbased variance estimates are conditional, they are unbiased for the unconditional variance and thus the usual modelbased conditional inference is still valid for unconditional hypothesis testing. ANCOVAI performs comparably to cRM [3, 17].
When study population is heterogeneous
A heterogeneous study population justifies the inclusion of a treatment by baseline weight interaction term. Thus, ANCOVAII is the correctly specified model, whereas ANCOVAI is a misspecified model. In this case, the “conditional” treatment effect is not constant across different values of baseline weight. The “marginal” treatment effect τ is simply the average of the conditional treatment effect over the distribution of the baseline weight and measures an overall treatment effect. As shown previously, both ANCOVA models can be used to estimate τ even though ANCOVAI is misspecified. Then, what is the advantage of using a more complex interaction model over a main effect model? It turns out the ANCOVAII estimator \( {\hat{\beta}}_{1, ols}^{(6)} \) is more efficient than the ANCOVAI estimator \( {\hat{\beta}}_{1, ols}^{(7)} \) because \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(6)}\right)\le \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(7)}\right) \) [5]. Only in a balanced design \( \mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(6)}\right)=\mathit{\operatorname{var}}\left({\hat{\beta}}_{1, ols}^{(7)}\right) \) and the two ANCOVA models perform comparably. Note that the OLS modelbased variance estimates for ANCOVAI and II are both biased for the corresponding unconditional variances, but the HCvariance estimators provide simple fixes.
The ANCOVAII and cRM estimators of τ are equivalent because \( {\beta}_2^{(6)}+{\beta}_3^{(6)}=\frac{\rho_0{\sigma}_0{\sigma}_{01}}{\sigma_0^2} \) and \( {\beta}_2^{(6)}=\frac{\rho_1{\sigma}_0{\sigma}_{11}}{\sigma_0^2} \) (Table 2). Two methods only differ in the way two estimators are estimated. ANCOVAII plugs in the OLS estimators\( {\hat{\beta}}_{2, ols}^{(6)} \) and \( {\hat{\beta}}_{3, ols}^{(6)} \), whereas cRM plugs in the REML estimators of the variance and covariance parameters. The numerical difference between the ANCOVAII and cRM estimators becomes smaller as sample size increases. As discussed previously, standard statistical softwares such as SAS does not output unconditional variance for ANCOVAII directly but the usual OLS modelbased standard errors and pvalues are biased for unconditional inference in heterogeneous scenario. The adjusted HCvariance estimator fixes this bias. Corrected ANCOVAII provides valid unconditional inference and performs comparably to cRM. Another alternative approach to estimate variances of the ANCOVAI and II estimators is to use bootstrap method [20].
Data example
No human data was used in this study. Instead we simulated three weight loss trial data sets based on a published study for three scenarios: homogeneous data, heterogeneous data with balanced and unbalanced designs as follows [21]:

1)
The baseline weights for the control and treatment arms were generated from normal distribution with mean 88 kg and standard deviation 14 kg. Weights at 6 month after treatment for the control arm have mean 86 kg and standard deviation 15 kg. This gives a ~ 2.3% change from baseline. The mean and standard deviation of body weight at the sixth month in the treatment arm are 83 kg and 15 kg, respectively; This corresponds to a 5.7% change from baseline.

2)
In the homogeneous data, the correlation coefficient between the pre and posttreatment weights is 0.9. One hundred eighty subjects were assigned to the treatment and control arms equally. In the heterogeneous data, the correlation coefficient between the pre and posttreatment weights in the control arm is 0.9 and 0.7 in the treatment arm. Sample sizes are (n_{0} = 90, n_{1} = 90) for the balanced design and (n_{0} = 60, n_{1} = 120) for the unbalanced design. We analyzed the data examples using the methods outlined in section Methods. The statistical results were reported in Table 3 (SAS programs are provided in the Additional file 1).
In the first data example, ANOVA produced the largest standard error and the largest pvalue. ANOVAChange and RM both outperformed ANOVA with much smaller standard errors and pvalues. ANCOVAI and cRM outperformed ANOVAChange and RM with smaller standard errors and pvalues. Although ANCOVAI and cRM are equivalent when sample size is large, there are still minor numerical differences between the two in finite sample.
For the second data example with a balanced design, Fig. 2a shows that there is a strong baseline weight by treatment interaction. Both ANCOVAI and II have heteroscedastic errors by treatment arm (Fig. 2b and c). As shown in Table 2, the OLS modelbased standard error of ANCOVAI is very similar to its HC and bootstrap standard errors. Thus, heteroscedasticity does not bias the modelbased standard error of ANCOVAI. Although ANCOVAII is robust against heteroscedasticity in the balanced design, the OLS modelbased standard error of ANCOVAII (s.e = 1.333) is still not correct because OLS fails to consider the variability of estimating the overall mean baseline weight. The adjusted HC standard error for ANCOVAII is 1.402, which is closer to the modelbased and HC standard errors of ANCOVAI. The bootstrapping standard errors for ANCOVAI and II are close to their HC or adjusted HC standard errors, which suggests the HC and adjusted HC variances perform well in estimating the unconditional variances. The cRM estimate and its standard error are close to those from ANCOVAI and II.
For the third example with an unbalanced design, Fig. 2d also reveals a baseline weight by treatment interaction. Both ANCOVA models have heteroscedastic errors by treatment arm (Fig. 2e and f). The modelbased standard errors of ANCOVAI and II are not valid. The modelbased standard errors were larger than the HC standard errors and thus overestimated the true conditional variances. Compared with ANCOVAI, ANCOVAII has a smaller HC standard error (also smaller pvalue) and thus is slightly more efficient. The adjusted HC standard error for ANCOVAII is very close to the modelbased standard error for cRM. The bootstrapping standard errors for ANCOVAI and II are very close to their HC or adjusted HC standard errors.
Discussion
In this study we compare the efficiency of six unbiased methods analyzing prepost designs. We found ANCOVA and cRM are the equally most efficient methods compared with other alternatives in homogeneous and heterogeneous scenarios. In this study, we focus on the scenario in which randomization is properly performed and these competing methods all target the same causal quantity. In the scenarios where the treatment is not properly randomized or not randomized at all (e.g., in an observational study), the baseline score will not be balanced by design. In this case these competing methods may target different causal quantities. Debate over using changescore analysis (or RM) verse ANCOVA in the nonrandomized setting, generally known as the lord’s paradox, is a wellknown example [22, 23].
The majority of previous studies has only examined homogeneous study population. In this setting, ANOVA is one of the least efficient approaches for analyzing prepost designs because it does not utilize any baseline information. ANOVAChange and RM incorporate the baseline score as part of outcome, whereas ANCOVAI includes the baseline score as a covariate. ANCOVAI outperforms ANOVAChange and RM because ANCOVAI utilizes the assumption that the baseline scores are balanced between two arms in a randomized study. Thus, change score is a less efficient way to utilize the baseline score than including the baseline score as a covariate. Since we seldom can control the values of the baseline score in randomized trials, the OLS assumption that the baseline score is fixed casts doubt on the validity of ANCOVA for hypothesis testing [6, 12]. Crager proved ANCOVAI is valid for unconditional inference in homogeneous scenario [6]. This conclusion can be simply attributed to that the conditional variance of the ANCOVAI estimator is an unbiased estimate for its unconditional variance [3].
A few studies investigated further a heterogeneous scenario [3, 4, 10, 12, 24]. Although the heterogeneity justifies the inclusion of the baseline measurement by treatment interaction term, ANCOVAI and II are both unbiased. Yang and Tsiatis showed that ANCOVAII has a smaller unconditional variance estimator than that of ANCOVAI unless in a balanced design [9]. However, the OLS modelbased variances of the ANCOVAI and II estimators, reported by standard statistical softwares, are conditional variances, not unconditional variances. The OLS modelbased standard errors and associated pvalues for ANCOVAII are generally questionable for unconditional inference, and the modelbased inference for ANCOVAI is biased only when the design is unbalanced [3, 4, 10, 24]. With the corrected HC variance estimators, both models provide valid unconditional inference. Choosing between ANCOVAI and II then becomes an evaluation of a tradeoff between simplicity and some gains in efficiency.
In homogenous setting, cRM was suggested as a superior choice to ANCOVAI because the unconditional variance of the cRM estimator is smaller than the conditional variance of the ANCOVAI estimator [25]. Kenward et al. pointed out that such direct comparison between the conditional and unconditional variances is not meaningful. Since both estimators are equivalent, it can be shown that cRM coupled with REML and Kenwardroger adjustment performs almost identically to ANCOVAI in finite samples [17]. In heterogeneous scenario, cRM is comparable to ANCOVAII [3]. In presence of missing data, applied researchers often prefer cRM over ANCOVA because it can utilize all observed data but ANCOVA uses only complete cases. However, imputation methods which utilize the strong prepost correlation, such as weighting and regression imputation, can improve the statistical power for ANCOVA without biasing estimates, making it comparable to cRM [17].
Furthermore, ANCOVA has several advantages over cRM: first, outcome should only be the variable that can be influenced by treatment. Baseline measurement is certainly not an outcome by this definition. It is conceptually more appropriate to include the baseline score as covariate, not model it as outcome [5]; Second, it is very convenient to include other baseline variables in a regression model for more efficient estimates of treatment effect. Third, it is easy to adjust for other patterns of heteroscedastic errors in an OLS regression. For example, we may expect larger variability in the posttreatment weights associated with larger baseline weights. cRM cannot handle this more complex type of heteroscedasticity easily. HCvariance estimators for ANCOVA are simple fixes and readily implemented in statistical softwares.
Conclusion
Comparing with other alternative methods, ANCOVA is a simple and the most efficient approach analyzing a prepost randomized design. When there exists a baseline score by treatment interaction, we need to assess the heteroscedasticity of ANCOVA particularly when the design is not balanced. The HCvariances should be used for valid inference when heteroscedasticity is present. Adding an interaction term in ANCOVA can gain some efficiency but not including this term does not bias results.
Availability of data and materials
SAS code is provided as the Additional file 1. There is no real data used. All data generated or analyzed during this study are included in this published article [and its supplementary information files].
Abbreviations
 ANOVA :

Analysis of variance model
 ANCOVA I :

Analysis of covariance model adjusting for the baseline measurement
 ANCOVA II :

Analysis of covariance model adjusting for the baseline measurement by treatment interaction
 RM :

Constrained repeated measure model
 cRM :

Constrained repeated measure model
 HC:

Heteroscadascityconsistent
References
 1.
Vickers AJ. Analysis of variance is easily misapplied in the analysis of randomized trials: a critique and discussion of alternative statistical approaches. Psychosom Med. 2005;67(4):652–5. https://doi.org/10.1097/01.psy.0000172624.52957.a8.
 2.
O'Connell NS, Dai L, Jiang Y, Speiser JL, Ward R, Wei W, et al. Methods for analysis of prepost data in clinical research: a comparison of five common methods. J Biom Biostat. 2017;8(1):1–8. https://doi.org/10.4172/21556180.1000334.
 3.
Wan F. Analyzing prepost randomized studies with one postrandomization score using repeated measures and ANCOVA models. Stat Methods Med Res. 2019;28(1011):2952–74. https://doi.org/10.1177/0962280218789972.
 4.
Wan F. Analyzing prepost designs using the analysis of covariance models with and without the interaction term in a heterogeneous study population. Stat Methods Med Res. 2020;29(1):189–204. https://doi.org/10.1177/0962280219827971.
 5.
Senn S. Change from baseline and analysis of covariance revisited. Stat Med. 2006;25(24):4334–44. https://doi.org/10.1002/sim.2682.
 6.
Crager MR. Analysis of covariance in parallelgroup clinical trials with pretreatment baseline. Biometrics. 1987;43(4):895–901. https://doi.org/10.2307/2531543.
 7.
Frison L, Pocock SJ. Repeated measures in clinical trials: analysis using mean summary statistics and its implications for design. Stat Med. 1992;11(13):1685–704. https://doi.org/10.1002/sim.4780111304.
 8.
Brogan DR, Kutner MH. Comparative analyses of pretestposttest research designs. Am Stat. 1980;34:229–32.
 9.
Yang L, Tsiatis AA. Efficiency study of estimators for a treatment effect in a pretestposttest trial. Am Stat. 2001;55(4):314–21. https://doi.org/10.1198/000313001753272466.
 10.
Winkens B, van Breukelen GJ, Schouten HJ, Berger MP. Randomized clinical trials with a pre and a posttreatment measurement: repeated measures versus ANCOVA models. Contemp Clin Trials. 2007;28(6):713–9. https://doi.org/10.1016/j.cct.2007.04.002.
 11.
Dimitrov DM, Rumrill J, Phillip D. Pretestposttest designs and measurement of change. Work. 2003;20:159–65.
 12.
Chen X. The adjustment of random baseline measurements in treatment effect estimation. J Stat Plan Inference. 2006;136(12):4161–75. https://doi.org/10.1016/j.jspi.2005.08.046.
 13.
Jennings E. Models for pretestposttest data: repeated measures ANOVA revisited. J Educ Behav Stat. 1988;13(3):273–80. https://doi.org/10.3102/10769986013003273.
 14.
Liang K, Zeger S. Longitudinal data analysis of continuous and discrete responses for prepost designs. Sankhya. 2000;62:134–48.
 15.
Donald AB, Gregory DA. Symmetrized percent change for treatment comparisons. Am Stat. 2006;60:27–31.
 16.
Cole TJ, Altman DG. Statistics notes: what is a percentage difference? BMJ. 2017;358:j3663. https://doi.org/10.1136/bmj.j3663.
 17.
Kenward MG, White IR, Carpenter JR. Re: should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials? (Liu GF et al., Stat Med 2009; 28: 2509–30). Stat Med. 2010;29(13):1455–6. https://doi.org/10.1002/sim.3868.
 18.
Kenward MG, Roger JH. Small sample inference for fixed effects from restricted maximum likelihood. Biometrics. 1997;53(3):983–97. https://doi.org/10.2307/2533558.
 19.
Long J, Ervin L. Using heteroscedasticity: consistent standard errors in the linear regression model. Am Stat. 2000;54:217–24.
 20.
Efron B, Tibshirani RJ. An introduction to the bootstrap. New York: Chapman & Hall; 1993. https://doi.org/10.1007/9781489945419.
 21.
Boozer CN, Daly PA, Homel P, et al. Herbal ephedra/caffeine for weight loss: a 6month randomized safety and efficacy trial. Int J Obes Relat Metab Disord. 2002;6:593–604.
 22.
Lord FM. A paradox in the interpretation of group comparisons. Psychol Bull. 1967;68(5):304–5. https://doi.org/10.1037/h0025105.
 23.
Egbewale BE, Lewis M, Sim J. Bias, precision and statistical power of analysis of covariance in the analysis of randomized trials with baseline imbalance: a simulation study. BMC Med Res Methodol. 2014;14(1):49. https://doi.org/10.1186/147122881449.
 24.
Senn S. Various varying variances: the challenge of nuisance parameters to the practicing biostatistician. Stat Methods Med Res. 2015;24(4):403–19. https://doi.org/10.1177/0962280214520728.
 25.
Liu GF, Lu K, Mogg R, Mallick M, Mehrotra DV. Should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials? Stat Med. 2009;28(20):2509–30. https://doi.org/10.1002/sim.3639.
Acknowledgements
Not applicable
Funding
Not applicable
Author information
Affiliations
Contributions
FW developed the idea for the paper, performed analysis, and drafted the manuscript. The author(s) read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The author declare that he has no competing risk
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Wan, F. Statistical analysis of two arm randomized prepost designs with one posttreatment measurement. BMC Med Res Methodol 21, 150 (2021). https://doi.org/10.1186/s12874021013239
Received:
Accepted:
Published:
Keywords
 Prepost design
 ANCOVA
 ANOVA
 Repeated measures
 Change score
 Treatment effect