Our finding suggests that randomized baseline values for the primary outcome were not reported in about half of trials (52%). The majority of them (72%) were studies of postoperative pain management in surgical patients, for whom baseline pain scores would be irrelevant. Therefore, we excluded those trials from our evaluation of the randomized baseline status of the primary outcome. After excluding these trials, the actual proportion of trials that did not report randomized baseline values for the primary outcome was 25%. One other meta-analysis on the effectiveness of distracting children during needle-related medical procedures reported baseline pain scores in 45% of trials, although the primary outcome’s baseline value may also have been irrelevant. In this case, the reported baseline pain might be considered as an important prognostic factor that could influence the outcome, rather than a baseline value of the primary outcome itself, but 55% of trials still did not report values for this variable. A prior study of baseline variables that are predictive of outcomes in a meta-analysis found that 40% of trials did not report some of the baseline demographic data [34].
All the studies with significant baseline imbalances (3/13) or heterogeneity (3/13) were found to use a final value measured at the end of follow-up instead of changes in scores from baseline. We also found that all meta-analyses with a large proportion of missing baseline values (more than 30%) evaluated their outcome measures at the end of follow-up, which would be natural considering the unavailability of baseline values. If the baseline scores for the randomized patients are unbalanced in a clinical trial, ANCOVA is the optimal statistical method for adjustment [35]. When conducting a meta-analysis, we may also need to adjust for baseline differences when a significant imbalance occurs [10, 36]. However, when the randomized baseline values for the outcome are not reported in trials, and thus are not available for such an analysis, any existing method of statistical analysis would not be practical to apply.
In the forest plot, the SMDs of baseline outcome values for trials were positioned in descending order according to the value of k/(n + 1), where k is the rank of the trial given a total of n trials. In this way, each observed SMD could be considered as the crude k/(n + 1) × 100th percentile in the forest plot as an empirical version of a cumulative probability plot. The overlaid cumulative normal distribution curve can be interpreted as a guideline that would theoretically appear if the true mean baseline difference is zero. The interval estimates of baseline differences for the included trials are, therefore, expected to be approximately in line with the guideline if they were obtained from well-performed randomized trials. In this context, we aimed to identify how the observed baseline differences after randomization deviated from what would be obtained in the ideal situation where randomization was well-performed. In the ideal world, the true mean baseline difference should be fixed as a value of zero for all trials if the randomization was conducted well, with a certain extent of variation that we obtained from a fixed-effect model. A notable observed departure from the guideline curve might then be interpreted as a phenomenon caused by heterogeneity, possibly resulting from variation in the quality of randomization from study to study, or potential missingness in reports of baseline values, as suggested by the presence of a certain pattern, which may also be related to systematic bias in randomization.
Detecting a deviation in the forest plot from the guideline suggests a possibility of selection bias in randomization. Evaluating the pattern of deviation can help us to understand the characteristics of the bias. We found skewness of baseline differences in four meta-analyses; three were skewed to the right and one to the left. The direction of skewness primarily suggests the direction of selection bias that should be examined through characteristics of the disease and the intervention in the context of the study objective. A skewed distribution of baseline differences may additionally suggest a pattern of missingness in the values that could be identified through further investigation of its funnel plot. In those four meta-analyses, a large proportion of included trials did not report the randomized baseline values (20–55%), and therefore, the observed trend may be associated with the missingness. Three of those four meta-analyses showed a trend for the SMDs to increase as the SEs became greater, implying that a majority of the unreported values would likely be large SMD values missing in the opposite direction. Therefore, if the baseline values were fully reported, their observed distribution would more likely have a mean value closer to zero. In contrast, one meta-analysis showed a different pattern, suggesting that the missing values would be found in the same direction where the majority of observed values lay. This means that the distribution of baseline differences after randomization would still have a shifted mean from zero even if the missing values were filled, implying that some systematic subversion in randomization might have occurred in those trials.
We limited our study sample to Cochrane reviews published in 1 specific year. Our study had some limitations in interpreting the randomized baseline differences in the primary outcome and their reporting status in relation to study characteristics. For example, most of the pain outcomes included in our study did not have measurements of baseline status (postoperative pain management in surgical patients) and there was no study on controlling pain by a procedure in patients with pain in our study set, which would have been of interest since baseline pain measurements would be relevant in such a study. Therefore, future research with a broader range of characteristics, involving not only Cochrane reviews but also studies published in other major medical journals, will enable further elaboration of our results.
Our investigation was carried out using SMD measurements to compare the resulting balance in baseline values of the primary outcomes after randomization on a common scale across meta-analyses, the primary outcomes of which could have different magnitudes by their nature. However, if any future meta-analysts wish to use our approach for meta-analyses where the MD would be a more appropriate measure, there would be no need to choose the SMD over the MD unless they intend to compare their results to those of other studies that used a different outcome.
Our suggested approach is a visual exploratory analysis by nature. Since we did not apply any formal test, there might be some concerns about the subjectivity of interpretation, particularly in the determination of skewness. Although limited statistical power would still be a problem, it would be useful to devise a formal meta-analytic test method for skewness that could appropriately be considered in the given context. Furthermore, additional research on how the baseline imbalanced and missed baseline values affect pooled results in meta-analyses should be conducted in more detail as future work.