It is generally agreed upon in the research community that a properly designed and implemented randomized controlled trial (RCT) serves as the optimal form of evidence-based research for establishing efficacy of a given therapy. The randomness element allows researchers the confidence that on average, study arms are similar and the only differing factor between these like groups is the intervention to be examined for efficacy. Statistically, this will allow for unbiased assessment of interventional effects with accuracy and precision. However, an individual trial must by definition exhibit some form of imbalance with respect to both measured and unmeasured confounders due to the random nature of the design. Although the expected level of imbalance is zero in these studies, no one trial will actually have zero imbalance on all (or any) important prognostic variables.
While an expected level of covariate imbalance not identically equal to zero may seem trivial, existing literature [1,2,3,4,5] illustrates the impacts of baseline variable imbalance on statistical parameters in analyses of intervention effects. Briefly, less than ‘statistically significant’ imbalance at the 5% level has the potential to impact power, type I error rate, and bias in marginal intervention effect estimates. The magnitude of these effects depends on the degree of association with outcome and both the directionality and magnitude of imbalance [1, 2, 4, 5]. Intuitively, if an interventional arm exposed to new therapy has a poorer disposition (e.g., increased disease severity at baseline) than a simultaneously measured placebo arm, it will be more difficult to detect a successful intervention effect if one exists. This translates into bias in an unadjusted treatment effect estimate and a corresponding loss of statistical power. Conversely, if that interventional arm has favorable prognosis in general at the beginning of the trial, it will be easier to claim the interventional arm has favorable outcome even if the new therapy is not efficacious; this corresponds to an increase in type I error rate.
Randomization literature and statistical theory literature have provided methodologies to mitigate these effects of baseline prognostic variables in both the design and analysis phases of RCTs. A common method for handling covariate imbalance is stratified block randomization. The idea of stratification and use of blocked randomization within strata dates to the mid-twentieth century [6, 7], and involves implementing separate pre-specified randomization sequences within subgroups of participants. While easy to both implement and understand, the randomization literature points to some faults in the methodology of stratified block randomization. Namely, the inability to handle large numbers of covariates/strata, the requirement to categorize continuous baseline variables, the requirement for pre-generated lists that may introduce additional sources of error if allocations become out of sequence, and the increased risk of selection bias when allocation becomes predictable.
To address the concerns of stratified block randomization, beginning in the 1970s, researchers developed a wide range of methods that fall under the general category of covariate-adaptive designs, or more simply “minimization”. Loosely defined, this is an adaptive allocation method that will strive to marginally (i.e., no longer within each stratum combination) balance several covariates at once [8]. The balance may be accomplished using some function (variance, range, etc.) to define “imbalance” for each variable of interest. An advantage of these covariate-adaptive designs lies in their flexibility of and range of choices for imbalance functions that can incorporate relative weights of covariates, more variables than stratified block methods, and continuous variables [9,10,11,12].
Despite ability to control baseline variable imbalance in an efficient and adaptive manner, employing such adaptive methods in a clinical trial often presents a logistical concern as they require complex algorithm implementation and programming with continual feedback, more extensive testing, and thus increased effort from the perspective of a trial programmer or statistician. It is generally agreed that investigators should attempt to control covariate imbalance (whether it be via stratification or covariate-adaptive methods), but adaptive methods carry more flexibility and better performance properties, resulting in “big rewards in scientific accuracy and credibility”. Despite the evidence suggesting the benefit of implementing covariate-adaptive designs, their use in modern day clinical trials remains limited. For example, in a review by Lin et al. 11–12% of trials examined reported the use of covariate-adaptive methods [13]. We speculate the complexity of covariate-adaptive designs may not be worth the added benefits to researchers. While software is available to implement such methods, these programs can be costly and their inputs not well understood, making interpretation of the randomization and subsequent results challenging. Further, the limited use of covariate-adaptive randomization techniques in practice may jeopardize the validity of findings across RCTs. As shown in the literature [1,2,3,4,5], seemingly small discrepancies across arms due to baseline covariate imbalance are propagated through to the final study analysis. In the era of reproducible research, imprecision due to covariate imbalance could lead to conflicting study results across repeated studies.
Related to study interpretation and reproducible research, in 2010, the Consolidated Standards of Reporting Trials (CONSORT) explanation was established to improve the clarity with which study methods are reported [14]. The CONSORT explanation recognizes the utility of restricted randomization that will control baseline variable imbalance, and it explains the benefits of stratification and minimization. It specifically highlights the limited number of variables that may be practical under stratification and the need for some random component applied to a minimization algorithm to prevent possible selection bias.
In adjusting for baseline variables in analyses, the CONSORT explanation further makes recommendations regarding appropriate vs. inappropriate adjustment in RCTs: “Although the need for adjustment is much less in RCTs than in epidemiological studies, an adjusted analysis may be sensible, especially if one or more variables is thought to be prognostic” [14]. Several authors have argued the benefits on increasing precision and reducing bias in various settings for known prognostics variables [5, 15,16,17,18,19,20]; however, CONSORT and International Conference on Harmonization (ICH) statements recommend this adjustment be pre-specified. Specifically, “the decision to adjust should not be determined by whether baseline differences are statistically significant” [14]. Taken together, there remains confusion and debate with regard to handling potentially influential baseline variables in RCTs [1, 3, 14, 21].
In summary, potentially influential baseline variables require special consideration in both the design and analysis phase of clinical trials. According to literature and guidelines, it would be ideal to control for these variables both at baseline, through stratified or adaptive allocation methods, and at analyses, with adjustment as appropriate. With these guidelines in mind, coupled with the most recent CONSORT explanation, we carried out a systematic review of published RCTs in four top tier journals with the ultimate goal of summarizing current practice in handling baseline variables in RCTs (i.e., the “real” world RCTs as opposed to the theoretical RCTs). Specifically, this review sought to (1) explore the frequency of use of allocation scheme types in published RCTs, and (2) explore the handling of prognostic covariates in analyses of clinical trial data. These results reveal not only the status of covariate adaptive techniques in modern RCTs, a measure important to adaptive research methodologists, but also the validity of such studies and their interpretability in the era of reproducible research.