Trial-level characteristics associate with treatment effect estimates: a systematic review of meta-epidemiological studies
BMC Medical Research Methodology volume 22, Article number: 171 (2022)
To summarize the up-to-date empirical evidence on trial-level characteristics of randomized controlled trials associated with treatment effect estimates.
A systematic review searched three databases up to August 2020. Meta-epidemiological (ME) studies of randomized controlled trials on intervention effect were eligible. We assessed the methodological quality of ME studies using a self-developed criterion. Associations between treatment effect estimates and trial-level characteristics were presented using forest plots.
Eighty ME studies were included, with 25/80 (31%) being published after 2015. Less than one-third ME studies critically appraised the included studies (26/80, 33%), published a protocol (23/80, 29%), and provided a list of excluded studies with justifications (12/80, 15%). Trials with high or unclear (versus low) risk of bias on sequence generation (3/14 for binary outcome and 1/6 for continuous outcome), allocation concealment (11/18 and 1/6), double blinding (5/15 and 2/4) and smaller sample size (4/5 and 2/2) significantly associated with larger treatment effect estimates. Associations between high or unclear risk of bias on allocation concealment (5/6 for binary outcome and 1/3 for continuous outcome), double blinding (4/5 and 1/3) and larger treatment effect estimates were more frequently observed for subjective outcomes. The associations between treatment effect estimates and non-blinding of outcome assessors were removed in trials using multiple observers to reach consensus for both binary and continuous outcomes. Some trial characteristics in the Cochrane risk-of-bias (RoB2) tool have not been covered by the included ME studies, including using validated method for outcome measures and selection of the reported results from multiple outcome measures or multiple analysis based on results (e.g., significance of the results).
Consistently significant associations between larger treatment effect estimates and high or unclear risk of bias on sequence generation, allocation concealment, double blinding and smaller sample size were found. The impact of allocation concealment and double blinding were more consistent for subjective outcomes. The methodological and reporting quality of included ME studies were dissatisfactory. Future ME studies should follow the corresponding reporting guideline. Specific guidelines for conducting and critically appraising ME studies are needed.
Randomized controlled trial (RCT) is regarded as the best reliable study design for evaluating the efficacy or effectiveness of healthcare interventions [1, 2]. The results of RCTs could be the cornerstone of supporting clinical practice and improving public health policy decision . However, defects in the design, conduct, analysis, interpretation and report have a substantial impact on the internal validity of RCTs, further distort the results of systematic reviews based on them, and ultimately cause inappropriate clinical decisions [3,4,5]. For example, a large body of empirical evidence has indicated that high or unclear risk of bias on allocation concealment [6,7,8], lack of blinding [2, 8, 9], smaller sample size [4, 10, 11], and single center trial [5, 12] showed larger treatment effect estimates. Therefore, it is urgent to identify these factors that could contort treatment effect estimates so as to ensure the authenticity of conclusions drawn from RCTs by scientifically rigorous design and methodology .
Based on the results of meta-analyses, meta-epidemiological (ME) study is a method of exploring the influence of specific trial-level characteristic on treatment effect estimates . The Cochrane risk-of-bias (RoB) tool, which is widely used for assessing the risk of bias of RCTs, was developed based on evidence generated from ME studies [13, 14]. Related systematic reviews of ME studies have been published in 2016 with literature search date up to May 2015 [15, 16]. However, an increasing number of ME studies have been published after May 2015, which have not been included in the previous systematic reviews [15, 16]. Some of those newly published ME studies showed inconsistent results on the associations between treatment effect estimates and trial-level characteristics, such as drop out [17, 18], Medline indexed [4, 19] and double blinding [described as double blinding or ≥ 2 key parties (participants, personnel, outcome assessors) were blinded] [8, 20], while other newly published ME studies explored additional trial-level characteristics, which have not been investigated by the previous ME studies, neither did they have been covered by the previous systematic reviews accordingly [15, 16] (e.g., trial protocol registration [3, 21] and patient − reported outcome measures) . So it is necessary for us to update the evidence.
This systematic review aimed to 1) summarize the empirical evidence regarding ME studies that investigated the associations between trial-level characteristics of RCTs and treatment effect estimates; 2) inform future best practice in RCT design as well as to provide empirical evidence for updating critical appraisal tool (e.g., The Cochrane RoB tool) for RCT; 3) describe characteristics of ME studies and methods used for the critical appraisal of ME studies, which will serve as a foundation for further development.
Protocol and registration
We performed and reported this systematic review with reference to guidance from the Cochrane Handbook for Systematic Reviews of Interventions  and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) recommendations . The protocol of this study was registered on the PROSPERO (CRD42020200947).
A ME study of RCTs, which assessed the efficacy, effectiveness or safety of an intervention was eligible, and the intervention can be therapeutic or preventive (e.g., vaccines). We only included ME study if it examined the differences in treatment effect estimates stratified by variation in trial-level characteristics (e.g., method of allocation concealment). There were no restrictions on language and publication date.
We excluded ME studies that compared treatment effect estimates between RCTs and observational studies. ME studies comparing treatment effect estimates according to different quantitative methodological quality scores of RCTs (e.g., Jadad scale, ranged from 0 to 5 scores) were excluded as such method has been abandoned . Conference abstracts, protocols, animal experiments, commentary, editorial or statistical methodology papers, and ME studies based on a single meta-analysis were excluded, as well. The most up-to-date version was included if the same ME study was published in different journals or was updated, with the remaining versions being regarded as supplementary sources for data extraction and critical appraisal.
Related systematic reviews [15, 16] have been published in 2016, which have conducted comprehensive literature search and identified eligible ME studies published before 2015. By adopting the common practice of previous updated systematic reviews [25, 26], we referred to the search strategies of previous systematic review  and searched PubMed, Embase, and Web of science with "meta-epidemiology", "treatment effect" and related keywords from January 2015 to August 2020. Reference lists of previously published systematic reviews [15, 16] as well as the identified ME studies were screened for additional studies. Although basing on the literature search results from the previous systematic reviews [15, 16] is a post-hoc decision, we believe it is an optimal choice in terms of saving time, manpower and resources without much (if any) compromising of the comprehensiveness of literature identification. Detailed search strategies were shown in Additional file 1: Appendix 1.
Study selection and data extraction
All the retrieved citations were screened firstly based on titles and abstracts, and full texts of the remaining potentially eligible literatures were further assessed. Bibliographical characteristics of all of the eligible ME studies, including both searched by ourselves and references from the previously published systematic reviews [15, 16], were extracted using a self-developed form based on the previous systematic review . The data extraction form has been piloted and refined among a sample of five ME studies. The study selection and data extraction were conducted by two trained researchers (HW, JL, WJ, YY, LQ and YC) in duplication. Any disagreement was discussed for consensus or consulted a senior researcher (IXYW). The following information was extracted from each ME study (Additional file 2: Appendix 2):
General characteristics of ME studies: year of publication; type of publication (journal article; agency report); involvement of epidemiologists/statisticians (referred to the definition reported by Delgado-Rodriguez et al. ); funding sources (public; private); type of intervention (pharmacology; non-pharmacology); medical conditions classified with the International Classification of Diseases 11th version (ICD-11); trial-level characteristics evaluated: some trial-level characteristics that included in the Cochrane RoB tool (e.g., sequence generation and allocation concealment), and others like sample size (larger sample, smaller sample) and number of centers (multicenter, single-center). Besides the above-mentioned pre-specified characteristics, we also included additional trial-level characteristics as post-hoc ones [e.g., publication language (English language, language other than English) and study design (parallel group, cross-over)] for the purpose of comprehensiveness.; type of outcome measure (binary; continuous; time-to-event); data sources for ME (collected meta-analyses, or trials, or previous ME studies);
Characteristics of the collections of meta-analysis: data sources (Cochrane review; non-Cochrane review); type of meta-analysis (aggregated data; individual participant data; network meta-analysis); management of overlapping meta-analyses; minimum number of trials per meta-analysis; criteria of selecting one meta-analysis from systematic review including more than one meta-analysis; data extraction sources (individual trial and/or systematic review);
Characteristic of quantitative analyses: statistical methods; methods used to account for clustering of trials within meta-analyses and to adjust meta-confounders; information related to heterogeneity and whether reported the direction of interpreting the results (e.g., stated that ratio of odds ratio (ROR) < 1 showed larger treatment effect estimates for trials with smaller sample size, as compared with larger sample size).
Methodological quality assessment
To the best of our knowledge, there was no published tool specifically for evaluating the methodological quality of ME study. Hence, we used a self-developed criterion consisting of 16 items based on the AMSTAR 2 (A MeaSurement Tool to Assess systematic Reviews-2)  and the criteria used in a related systematic review published by Dechartres and colleagues . Inclusion of these 16 items was based on consensus among all co-authors, with five items derived from AMSTAR 2  and the remaining 11 items from Dechartres and colleagues’ criteria (Additional file 3: Appendix 3) . Pairs of trained researchers (HW, JL and YL) independently assessed the methodological quality of included ME studies [15, 16]. Discrepancies were resolved by discussion or consulting a senior researcher (IXYW) when they persisted.
All the results were narratively summarized and presented. Frequency (%) with their corresponding 95% confidence interval (CI) was used to summarize binary outcome, while median and interquartile or range for continuous outcome. Differences in treatment effect estimates were measured with ratio of effect size (e.g., ROR) for binary outcome and differences in standardized mean difference (SMD) for continuous outcome. Differences in treatment effect estimates were re-calculated to ensure a ratio of effect size less than 1 or a difference in SMD less than 0 reveal larger treatment effect estimates for trials with high or unclear risk of bias, or for trials with the second element (e.g., larger sample versus smaller sample, smaller sample was regarded as the second element). Associations between treatment effect estimates and trial-level characteristics were presented with forest plots. Similar to the previous systematic review , we did not combine the results from different ME studies instead of presenting them by forest plots due to the potential overlaps among ME studies. Results of subgroup analyses based on trial-level characteristics (e.g., type of outcome) or meta-analysis-level characteristics (e.g., type of review) were presented when available. All data analyses were conducted using R 3.6.1 (http://www.R-project.org, the R Foundation for Statistical Computing, Vienna, Austria).
Overall, 2705 citations were identified based on electronic databases search and reference lists checking. After excluding duplications, the remaining 1983 records were screened by their titles and abstracts. Accordingly, 131 went through full text assessments, with 80 ME studies (Additional file 4: Appendix 4) being included, and the remaining 51 being excluded with reasons (Additional file 5: Appendix 5). Figure 1 describes the results of literature search and process of literature selection.
The 80 ME studies were published between 1995 and 2020 (median: 2013), with 25/80 (31%) being published after 2015 (the time of the last systematic review published) (Additional file 6: Appendix 6). Most ME studies were published as journal articles (76/80, 95%). Among them, 26/80 (32%) were published in general journals and 50/80 (62%) were published in medical specialty journals, including 26/80 (32%) in epidemiology/biostatistics journals. Moreover, 56/77 (73%) ME studies involved at least one epidemiologist/statistician. Among the 64 ME studies that provided funding information, only two (2/64, 3%) received funding from private sources, 48/64 (75%) from public sources while the remaining 14 (14/64, 22%) did not receive any funding support (Table 1).
Most (51/62, 82%) ME studies assessed both pharmacological and non-pharmacological interventions. Binary outcomes were included in 60/80 (75%) ME studies, while time-to-event outcomes were included in only 5/80 (6%). Two thirds (48/72, 67%) ME studies covered various medical areas, followed by diseases of the digestive system (9/72, 12%), pregnancy, childbirth, or the puerperium (5/72, 7%), and diseases of the musculoskeletal system or connective tissue (5/72, 7%) (Table 1, Additional file 7: Appendix 7). The most frequently evaluated trial-level characteristic was allocation concealment (30/80, 38%), followed with sequence generation (24/80, 30%), double blinding (19/80, 24%), blinding of outcome assessors (18/80, 22%), and blinding of participants (13/80, 16%). Additional file 8: Appendix 8 shows detailed trial-level characteristics evaluated in each ME study.
Details of the collected meta-analyses among the ME studies
Most (63/80, 79%) ME studies were based on data collected from meta-analyses, only 11/80 (14%) utilized data collected from trials and 6/80 (8%) directly collected data from previously published ME studies (Table 1). Among the 63 ME studies based on data from meta-analyses, 58 reported data sources, including 28/58 (48%) only considering Cochrane review, 3/58 (5%) only considering non-Cochrane review and 27/58 (47%) considering both. Most (58/63, 92%) ME studies were based on aggregated data meta-analyses, with the remaining five considered other type of meta-analyses, including both aggregated data and individual participant data (3/63, 5%), individual participant data only (1/63, 2%) and network of aggregated data only (1/63, 2%). Thirty-five (35/63, 56%) ME studies explicitly managed overlapping meta-analyses, whereas 28/63 (44%) did not report related information. The minimum number of trials included per meta-analysis ranged from one to ten, while 26/63 (41%) ME studies did not provide this information. When the included systematic review had more than one meta-analysis, forty-four (44/63, 70%) ME studies selected one meta-analysis from each systematic review, based on multiple criteria (20/44, 45%) or the primary outcome (10/44, 23%). Four ME studies (4/63, 6%) included all meta-analyses reported in systematic reviews without selection, while the remaining 15/63 (24%) did not mention relevant information (Table 2).
Details of quantitative analyses among the ME studies
Most (68/80, 85%) ME studies quantitatively synthesized the difference of treatment effect estimates (Table 1). The most commonly used method for combining results was two-step approach (within-meta-analysis comparison and then combination) (43/68, 63%). Clustering of trials within a meta-analysis was accounted in 53 of the 61 (87%) ME studies based on data from meta-analyses. More than 70% ME studies assessed the heterogeneity during data synthesis (59/68, 87%), adjusted meta-confounders (54/68, 79%), and used random effect models to take into account variability across meta-analyses/trials (43/61, 70%). Sixty (60/68, 88%) ME studies clearly reported the direction of interpreting the results, while the remaining 8/68 (12%) did not provide this information. Forty-eight (48/68, 71%) ME studies conducted subgroup analyses either based on trial-level characteristics or meta-analysis-level characteristics (Table 3). Additional file 9: Appendix 9 presents detailed information on the subgroup analyses of the included ME studies.
The included ME studies generally performed well in three items, with at least 90% compliance rates. These included giving a clear description of inclusion criteria and reasons for exclusion (74/80, 92%), reporting information related to conflicts of interest and funding supports (74/80, 92%), and providing a clear definition of trial characteristics evaluated in ME studies (72/80, 90%). On the other hand, less than one third ME studies fulfilled the following three methodological criteria: assessing the methodological quality of the included studies (26/80, 33%), publishing a protocol developed prior to the conduct of the ME study (23/80, 29%), and providing a list of excluded studies with justifications (12/80, 15%) (Table 4).
Impact of trial-level characteristics on treatment effect estimates
Eleven out of 14 (11/14) ME studies indicated that trials with high or unclear risk of bias for sequence generation showed associations with larger treatment effect estimates, three of which found such associations statistically significant. Fourteen out of 18 (14/18) ME studies showed trials with high of unclear risk of bias on allocation concealment were associated with larger treatment effect estimates (11 found statistically significant associations). Ten out of 15 (10/15) ME studies showed that trials with high or unclear risk of bias on double blinding related to larger treatment effect estimates, of which such associations in five ME studies were statistically significant. Aforementioned associations were also observed when blinding was considered separately as blinding of participants (5/5 ME studies), blinding of personnel (1/4 ME studies) and blinding of outcome assessors (4/8 ME studies). As for blinding of outcome assessor, one out of four ME studies showed statistically significant association) (Fig. 2).
All of (5/5) ME studies showed that trials with smaller sample size had an association with larger treatment effect estimates than that of trials with larger sample size, four of which found statistically significant associations. Above-mentioned significant association was especially seen in one ME study  regardless of the definition of smaller and larger sample size (e.g., Q1 versus Q4, < 50 versus ≥ 50) (Fig. 2, Additional file 10: Appendix 10). Two out of two (2/2) ME studies showed larger treatment effect estimates for early stopping trials, and such association was found statistically significant in 1/2 ME study. Inconsistencies in direction of point estimation on ratio of effect size were observed among the ME studies for trials with high or unclear risk of bias in incomplete outcome data (4 ME studies) and selective outcome reporting (3 ME studies). All of three (3/3) ME studies showed that published trials, compared with grey literature, produced larger treatment effect estimates, with 2/3 ME studies showing statistically significant association. Four out of five (4/5) ME studies showed larger treatment effect estimates for trials published in language other than English, two of which found it statistically significant. Inconsistent results were seen in non-Medline indexed trials versus Medline indexed trials as well, with two (2/4) ME studies showing lower treatment effect estimates for non-Medline indexed trials, while remained two (2/4) indicating larger.
Results from four out of five (4/5) ME studies revealed that single-center trials were associated with larger treatment effect estimates than that of multi-center trials, so did cross over trials than that of parallel trials (2/2 ME studies). Such associations were found statistically significant in 2/4 and 1/2 ME studies, respectively. Two out of four (2/4) ME studies found that trials without conducting intention to treat analysis showed larger treatment effect estimates, one of which found it statistically significant. Nonetheless, no statistical association were found between trials with baseline imbalance (3 ME studies), existence of competing interests (2 ME studies) and industry funding (3 ME studies) and treatment effect estimates (Fig. 2).
One ME study  demonstrated that overall trials showed significantly much lower treatment effect estimates than that of first trial (ratio of effect size: 2.67, 95% CI: 2.12–3.37), although the remaining ME study  did not find such association (ratio of effect size: 1.03, 95% CI: 0.98–1.08). Several other trial-level characteristics including sufficient follow-up, placebo control and statistician involvement, among others have been investigated as well, with no significant associations being found (Additional file 10: Appendix 10).
Three out of six (3/6) ME studies reported the association between trials with high or unclear risk of bias on sequence generation and larger treatment effect estimates (1/3 ME study showing statistically significant association). Four out of six (4/6) ME studies showed trials with high or unclear risk of bias on allocation concealment related to larger treatment effect estimates, of which one ME study found it statistically significant. Inconsistencies in direction of point estimation on difference of effect size were seen among the ME studies when blinding was separately considered as three independent parties, including blinding of participants (8 ME studies), blinding of personnel (4 ME studies) and blinding of outcome assessors (7 ME studies). Such inconsistencies were removed when the three parties were considered at the same time as double blinding, with three out of four (3/4) ME studies showed larger treatment effect estimates for trials with high or unclear risk of bias (1/3 ME study found such association statistically significant) (Fig. 3).
Three ME studies consistently found that smaller sample size (or inadequate powered) trials were related to larger treatment effect estimates. One out of two (1/2) ME studies reported that trials with drop outs were associated with lower treatment effect estimates (Fig. 3), while the other ME study showed opposite direction. Additionally, single-center trials (1 ME study), individual RCT (versus cluster RCT) (1 ME study) and trials with no protocol registration (1 ME study) showed significant associations with larger treatment effect estimates. Most trial characteristics did not show any significant associations with treatment effect estimates in continuous outcomes, including early stopping (1 ME study), incomplete outcome reporting (1 ME study), selective outcome reporting (1 ME study), intention to treat analysis (2 ME studies), baseline imbalance (4 ME studies) and industry funded trials (1 ME study), among others (Fig. 3).
For binary outcomes, larger treatment effect estimates were observed in trials with high or unclear risk of bias on allocation concealment (6/6 ME studies for subjective outcome and 6/10 ME studies for objective outcome) and double blinding (4/5 ME studies for subjective outcome and 6/8 ME studies for objective outcome). The significant associations between high or unclear risk of bias and larger treatment effect estimates were much more frequently observed among subjective outcomes than that of objective outcomes [allocation concealment (5/6 versus 1/10 ME studies) and double blinding (4/5 versus 2/8 ME studies)] (Fig. 4-a). For continuous outcomes, trials with high or unclear risk of bias on allocation concealment (2/3 and 1/3 ME studies for subjective outcome and objective outcome, respectively) and double blinding (3/3 and 2/3 ME studies for subjective outcome and objective outcome, respectively) related to larger treatment effect estimates. However, 1/3 ME study found that above-mentioned associations were statistically significant only in the subjective outcome (Fig. 4-b).
For both binary and continuous outcomes, larger treatment effect estimates for trials with high or unclear risk of bias on blinding of outcome assessors were only observed in trials using single observer for non-blinded assessment (compared with trials using multiple observer consensus for non-blinded assessment) and trials with industry funding (trials with non-commercial funding) (Fig. 4).
For binary outcomes, larger treatment effect estimates for trials published in language other than English were only seen in trials with pharmacological intervention, using inactive control, focusing on complementary medicine and included in non-Cochrane review other than trials with non-pharmacological intervention, using active control, focusing on non-complementary medicine and included in Cochrane review (Additional file 11: Appendix 11-B-2, Appendix 11-B-3, Additional file 12: Appendix 12-B-1). For continuous outcomes, larger treatment effect estimates for trials with high or unclear risk of bias on blinding of participants were only demonstrated in non-pharmacological intervention trials (Additional file 11: Appendix 11-C-2), while the associations between treatment effect estimates and risk of bias for both blinding of participants and allocation concealment were only seen in complementary medicine trials (Additional file 11: Appendix 11-C-4). It is worth noted that larger treatment effect estimates in first trial as compared with subsequent trial were consistently observed regardless of the sample size (< 300 and > 300), risk of bias (low, unclear and high) or effect size (≤ 0.5 SMDs and > 0.5 SMDs) of the first trial for continuous outcomes (Additional file 12: Appendix 12-C-1). Such consistency has not been explored for binary outcomes. Details on subgroup analyses for both binary and continuous outcomes were displayed in Fig. 4, Additional file 11: Appendix 11 and Additional file 12: Appendix 12.
This systematic review identified 80 ME studies on intervention field, with almost one-third uncovered by the previous systematic reviews [15, 16]. The included ME studies covered various medical areas and interventions. An abundant of trial-level characteristics have been evaluated, varied from risk of bias domains (e.g., blinding) to language (English and non-English), and age of participants (e.g., children and adult), with allocation concealment, sequence generation and blinding being most commonly evaluated. On average, consistently significant associations with larger treatment effect estimates were observed in trials with high or unclear (versus low) risk of bias on sequence generation, allocation concealment, double blinding and smaller sample size. For allocation concealment and double blinding, the significant associations were more frequently observed in subjective outcomes. The impacts of missing outcome data and intention-to-treat included in the Cochrane RoB2 tool were uncertain. Furthermore, some characteristics in the Cochrane RoB2 tool have not been covered by the included ME studies yet, including using a validated method for outcome measures and selection of the reported results from multiple outcome measures or multiple analysis based on results (e.g., significance of the results).
Besides larger number and more updated ME studies were included when compared to the previous systematic reviews [15, 16], we identified some interesting findings in the subgroup analyses: i) High or unclear risk of bias on blinding of outcome assessors were significantly associated with larger treatment effect estimates in trials using single observer for non-blinded assessment for both binary and continuous outcomes. This finding indicates that when blinding of outcome assessor is not possible, reaching consensus by multiple assessors might be an alternative strategy to reduce potential detection bias; ii) larger treatment effect estimates for trials published in non-English (binary outcome), trials with high or unclear risk of bias on blinding of participants (continuous outcome) and allocation concealment (continuous outcome) were only seen in trials focusing on complementary medicine. A tentative explanation for the differences between these subgroups is that trials on complementary medicine had a higher probability of suffering from methodological flaws ; iii) larger treatment effect estimates in first trial as compared with subsequent trial were consistently observed, regardless of the trial size, risk of bias or effect size of the first trial for continuous outcomes, indicating the robustness of the association. However, such explorations are missing in binary outcomes, although inconsistencies were observed between the two available ME studies [29, 30]. That invites future ME studies to address.
Several reporting and methodological flaws among the sampled ME studies are worth to be noted. Over one-fifth ME studies missed reporting some key information such as funding sources, criteria used for selecting one meta-analysis within each systematic review and management of overlapping meta-analyses. Future ME studies are suggested to follow the corresponding reporting guideline  to improve their reporting and transparency. Commonly methodological flaws waiting for future ME studies to overcome included assessing the methodological quality of included studies, publishing a protocol, and providing a list of excluded studies with reasons. Furthermore, before the availability of a guideline for conducting ME studies, future ME studies could at least refer to existing publications regarding the statistical methods [33,34,35] and sample size  of a ME study.
Several additional key points regarding the conducting of ME studies worth discussed as well. Some preliminary steps are needed to reduce potential bias [37, 38] before combing differences in treatment effect estimates across meta-analyses or trials in a ME study. First, with regards to management of overlapping, using a study more than one time in the same quantitative analysis may overstate its sample size and number of events. Although it may produce greater precision and better robustness of the conclusions, the conclusion would be wrong . However, almost half of the ME studies did not report whether overlapping meta-analyses were managed, which calls attentions from the future ME studies. Second, ensure the results from different meta-analyses have the same sense of interpretation  by checking experimental and control arm in each trial when two active interventions are compared , and reclassifying outcomes (e.g., survival re-coded as mortality) if needed . However, only half of the ME studies reported information on whether experimental and control arm had been checked.
While using data from meta-analyses to assess the difference in treatment effect estimates, the results might be distorted by the presence of within- and between-meta-analysis heterogeneity if the clustering of trials within meta-analysis is not accounted for . That was observed in more than 10% related ME studies. Being observational studies in nature, ME studies are generally at risk of confounding . Despite repeated emphases [16, 33, 41], ME studies that completely controlled confounders are rare . About four-fifth of the included ME studies adjusted meta-confounders, which have been improved compared to the previous systematic reviews [15, 16]. However, 59% adjusted confounders solely based on subgroup analysis, with very limited number of confounders being controlled at one time, indicating incomplete control of confounding. Alternatively, multiple variable analysis could be a better choice. Meanwhile, the selection of potential confounders is challenging, besides empirical evidence and theoretical consideration, the directed acyclic graph (DAG) approach proposed by Herbert  is recommended. Additionally, ME studies based on collection of trials could also reduce confounding through comparison within the same trial (e.g., compare blinding with non-blinding assessment) .
Further issues regarding confounding are that the association between blinding and treatment effect estimates were more consistent when more than one party (participants and assessors with/without personnel) was considered simultaneously as double blinding for both binary and continuous outcomes. During trial reporting, the CONSORT statement  encourages trial author to clearly state who is blinded rather than ambiguously state double blinding. However, in ME studies, blinding of different parties was generally correlated with each other (e.g., blinding of participants and blinding of personnel), accordingly, analyzing these parties separately without controlling the remaining ones might introduce confounding bias. Therefore, combining the three key parties (participants, personnel and outcome assessors) as one group might be an optimal choice for reducing confounding bias in ME studies. Similar consideration is needed for allocation concealment. We agree with Moustgaard et. al  that theoretically, the association between allocation concealment and treatment effect estimates should not depend on type of outcome (subjective or objective), which disagreed with available ME studies . In theory, the confounding of blinding could be a major concern under such scenario. It is difficult to implement blinding especially blinding of participants and personnel when allocation sequence is unconcealed. Careful considerations of other confounders as well as the relationship among different trial characteristics are needed for future ME studies.
As agreeing with the previous systematic reviews [15, 16], this review also found that significant associations between trial-level characteristics and treatment effect estimates were much frequently seen in binary outcomes than that of continuous outcomes, including subgroup analyses. Larger sample of meta-analyses with more homogeneous data on binary outcomes  might contribute to the differences . Although it have been raised by the previous systematic review , more attentions are still needed on continuous outcome for the future ME studies as results based on binary outcome may not be directly generalized to continuous outcome.
Strengths and limitations
This systematic review has several strengths. First, no limit on medical areas and type of interventions ensured the generalizability of our results. Second, methodological quality of included ME studies has been assessed to inform where improvements are needed in the future. Third, comprehensive information related to subgroup analyses was extracted, and interesting subgroups like bias introduced by lack of blinding of outcome assessors might be removed by adopting multiple observer consensus  have been identified.
Some apparent limitations are worth noted in our study. First, some ME studies sometimes use “methodological study” or “research on research” to describe . However, we directly adopted the literature search strategies from the previously published systematic review  to identify eligible ME studies. That did not include the aforementioned search terms, which probably led to missing some potentially eligible studies.
Second, there was no specific tool for assessing the methodological quality of ME studies. Therefore, we used a self-developed criterion through discussing within group members, without consulting external specialists.
Third, we extracted the results of unadjusted analysis for each ME study as nearly three-fifth ME studies adjusted confounders using subgroup analysis rather than multiple variables analysis (32/54, 59%) or did not report adjusted results (13/22, 59%).
Fourth, we did not combine the results quantitatively either for the main analyses or subgroup analyses due to the potential overlapping of meta-analyses and trials. Although we presented the results by considering both the statistically significant differences and the direction of treatment effect estimates to reduce the impact of solely based on vote counting. Without quantitative combination, the potential influence of Simpson’s paradox might not be completely removed. Furthermore, while conducting an ME study, duplications should be considered and removed . However, among the 63 included ME studies based on collection of meta-analyses, only 35 (56%) managed the overlaps of RCTs. That calls for future ME studies to pay attention to the duplicated RCTs, especially when quantitative synthesis is conducted.
Fifth, only ME studies on intervention field were considered. Results from this review may not be generalized to other fields of ME studies, such as diagnosis accuracy [46,47,48], prognostic study [49, 50], and prediction models .
Sixth, related information of methodology and reporting was extracted based on publications, which may introduce bias if authors did not conduct as reported or did not report related information.
Identifying trial-level characteristics that impact the treatment effect estimates is critical for both trial design and critical appraisal in the era of evidence-based medicine. In this updated systematic review, we collected additional empirical evidence about the associations between trial-level characteristics and treatment effect estimates. Authors of RCTs are suggested to account for trial characteristic that are significantly associated with treatment effect estimates, like sequence generation, allocation concealment, blinding and sample size when designing and conducting RCTs. When it is difficult to blind outcome assessors, a multiple assessors consensus strategy could be an alternative approach to reduce detection bias. When assessing the impact of blinding on treatment effect estimates in ME studies, combing the three key parties (participants, personnel and outcome assessors) of blinding as one group might reduce potential confounding.
We found consistently significant associations between treatment effect estimates and sequence generation, allocation concealment, double blinding and sample size. The associations between treatment effect estimates and allocation concealment and double blinding were more consistent in trials using subjective outcomes. More ME studies are needed to assess the impact of trial characteristics in the Cochrane RoB2 tool without sufficient empirical evidence supported currently, including missing outcome data, intention-to-treat, methods used for outcome measures and selection of the reported results from multiple outcome measures or multiple analysis based on results (e.g., significance of the results). Furthermore, the methodological and reporting quality of included ME studies are dissatisfactory. Future researchers are recommended to reporting ME studies following the corresponding guideline . Specific guidelines for conducting ME studies and assessing the methodological quality of ME studies are needed as well.
Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Cochrane risk-of-bias 2
Randomized controlled trial
Preferred Reporting Items for Systematic Reviews and Meta-Analyses
International Classification of Diseases 11th version
Ratio of odds ratio
- AMSTAR 2:
A MeaSurement Tool to Assess systematic Reviews-2
Standardized mean difference
Directed acyclic graph
Moustgaard H, Clayton GL, Jones HE, Boutron I, Jørgensen L, Laursen DRT, Olsen MF, Paludan-Müller A, Ravaud P, Savović J, et al. Impact of blinding on estimated treatment effects in randomised clinical trials: meta-epidemiological study. BMJ (Clinical research ed). 2020;368:l6802.
Hróbjartsson A, Thomsen AS, Emanuelsson F, Tendal B, Hilden J, Boutron I, Ravaud P, Brorson S. Observer bias in randomised clinical trials with binary outcomes: systematic review of trials with both blinded and non-blinded outcome assessors. BMJ (Clinical research ed). 2012;344:e1119.
Dechartres A, Ravaud P, Atal I, Riveros C, Boutron I: Association between trial registration and treatment effect estimates: a meta-epidemiological study. BMC Med. 2016;14(1):1–9.
Papageorgiou SN, Antonoglou GN, Tsiranidou E, Jepsen S, Jäger A. Bias and small-study effects influence treatment effect estimates: a meta-epidemiological study in oral medicine. J Clin Epidemiol. 2014;67(9):984–92.
Dechartres A, Boutron I, Trinquart L, Charles P, Ravaud P. Single-center trials show larger treatment effects than multicenter trials: evidence from a meta-epidemiologic study. Ann Intern Med. 2011;155(1):39–51.
Wood L, Egger M, Gluud LL, Schulz KF, Jüni P, Altman DG, Gluud C, Martin RM, Wood AJ, Sterne JA. Empirical evidence of bias in treatment effect estimates in controlled trials with different interventions and outcomes: meta-epidemiological study. BMJ (Clinical research ed). 2008;336(7644):601–5.
Herbison P, Hay-Smith J, Gillespie WJ. Different methods of allocation to groups in randomized trials are associated with different levels of bias. A meta-epidemiological study. J Clin Epidemiol. 2011;64(10):1070–5.
Dechartres A, Altman DG, Trinquart L, Boutron I, Ravaud P. Association between analytic strategy and estimates of treatment outcomes in meta-analyses. JAMA. 2014;312(6):623–30.
Kjaergard LL, Villumsen J, Gluud C. Reported methodologic quality and discrepancies between large and small randomized trials in meta-analyses. Ann Intern Med. 2001;135(11):982–9.
Chaimani A, Vasiliadis HS, Pandis N, Schmid CH, Welton NJ, Salanti G. Effects of study precision and risk of bias in networks of interventions: a network meta-epidemiological study. Int J Epidemiol. 2013;42(4):1120–31.
Dechartres A, Trinquart L, Boutron I, Ravaud P. Influence of trial sample size on treatment effect estimates: meta-epidemiological study. BMJ (Clinical research ed). 2013;346:f2304.
Unverzagt S, Prondzinsky R, Peinemann F. Single-center trials tend to provide larger treatment effects than multicenter trials: a systematic review. J Clin Epidemiol. 2013;66(11):1271–80.
Cochrane Handbook for Systematic Reviews of Interventions version 6.2 (updated February 2021) [www.training.cochrane.org/handbook]
Sterne JAC, Savović J, Page MJ, Elbers RG, Blencowe NS, Boutron I, Cates CJ, Cheng HY, Corbett MS, Eldridge SM, et al. RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ (Clinical research ed). 2019;366:l4898.
Dechartres A, Trinquart L, Faber T, Ravaud P. Empirical evaluation of which trial characteristics are associated with treatment effect estimates. J Clin Epidemiol. 2016;77:24–37.
Page MJ, Higgins JPT, Clayton G, Sterne JAC, Hróbjartsson A, Savović J: Empirical evidence of study design biases in randomized trials: Systematic review of meta-epidemiological studies. PLoS ONE. 2016;11(7):e0159267.
Khan KS, Daya S, Collins JA, Walter SD. Empirical evidence of bias in infertility research: overestimation of treatment effect in crossover trials using pregnancy as the outcome measure. Fertil Steril. 1996;65(5):939–45.
Savović J, Jones H, Altman D, Harris R, Jűni P, Pildal J, Als-Nielsen B, Balk E, Gluud C, Gluud L, et al. Influence of reported study design characteristics on intervention effect estimates from randomised controlled trials: combined analysis of meta-epidemiological studies. Health Technol Assess (Winchester, England). 2012;16(35):1–82.
Dechartres A, Atal I, Riveros C, Meerpohl J, Philippe R. Association between publication characteristics and treatment effect estimates a meta-epidemiologic study. Ann Intern Med. 2018;169(6):385–93.
Pildal J, Hróbjartsson A, Jørgensen KJ, Hilden J, Altman DG, Gøtzsche PC. Impact of allocation concealment on conclusions drawn from meta-analyses of randomized trials. Int J Epidemiol. 2007;36(4):847–57.
Haring R, Ghannad M, Bertizzolo L, Page MJ. No evidence found for an association between trial characteristics and treatment effects in randomized trials of testosterone therapy in men: a meta-epidemiological study. J Clin Epidemiol. 2020;122:12–9.
Berthelsen DB, Ginnerup-Nielsen E, Juhl C, Lund H, Henriksen M, Hróbjartsson A, Nielsen SM, Voshaar M, Christensen R. Controversy and debate on meta-epidemiology. Paper 1: Treatment effect sizes vary in randomized trials depending on the type of outcome measure. J Clin Epidemiol. 2020;123:27–38.
Page MJ, Moher D, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE, et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ (Clinical research ed). 2021;372:n160.
Herbison P, Hay-Smith J, Gillespie WJ. Adjustment of meta-analyses on the basis of quality scores should be abandoned. J Clin Epidemiol. 2006;59(12):1249–56.
Walsh M, Collister D, Zeng L, Merkel PA, Pusey CD, Guyatt G, Au Peh C, Szpirt W, Ito-Hara T, Jayne DRW. The effects of plasma exchange in patients with ANCA-associated vasculitis: an updated systematic review and meta-analysis. BMJ (Clinical research ed). 2022;376: e064604.
Cai T, Abel L, Langford O, Monaghan G, Aronson JK, Stevens RJ, Lay-Flurrie S, Koshiaris C, McManus RJ, Hobbs FDR, et al. Associations between statins and adverse events in primary prevention of cardiovascular disease: systematic review with pairwise, network, and dose-response meta-analyses. BMJ (Clinical research ed). 2021;374: n1537.
Delgado-Rodriguez M, Ruiz-Canela M, De Irala-Estevez J, Llorca J, Martinez-Gonzalez A. Participation of epidemiologists and/or biostatisticians and methodological quality of published controlled clinical trials. J Epidemiol Community Health. 2001;55(8):569–72.
Shea BJ, Reeves BC, Wells G, Thuku M, Hamel C, Moran J, Moher D, Tugwell P, Welch V, Kristjansson E, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ (Clinical research ed). 2017;358: j4008.
Alahdab F, Farah W, Almasri J, Barrionuevo P, Zaiem F, Benkhadra R, Asi N, Alsawas M, Pang Y, Ahmed AT, et al. Treatment effect in earlier trials of patients with chronic medical conditions: a meta-epidemiologic study. Mayo Clin Proc. 2018;93(3):278–83.
Gartlehner G, Dobrescu A, Evans TS, Thaler K, Nussbaumer B, Sommer I, Lohr KN. Average effect estimates remain similar as evidence evolves from single trials to high-quality bodies of evidence: a meta-epidemiologic study. J Clin Epidemiol. 2016;69:16–22.
Kim C-K, Kim D-H, Lee MS, Kim J-I, Wieland LS, Shin B-C. Randomized Controlled Trials on Complementary and Traditional Medicine in the Korean Literature. Evid-Based Complement Alter Med. 2014;2014:194047.
Murad MH, Wang Z. Guidelines for reporting meta-epidemiological methodology research. Evid Based Med. 2017;22(4):139–42.
Sterne JA, Jüni P, Schulz KF, Altman DG, Bartlett C, Egger M. Statistical methods for assessing the influence of study characteristics on treatment effects in “meta-epidemiological” research. Stat Med. 2002;21(11):1513–24.
Siersma V, Als-Nielsen B, Chen W, Hilden J, Gluud LL, Gluud C. Multivariable modelling for meta-epidemiological assessment of the association between trial quality and treatment effects estimated in randomized clinical trials. Stat Med. 2007;26(14):2745–58.
Welton NJ, Ades AE, Carlin JB, Altman DG, Sterne JAC: Models for potentially biased evidence in meta-analysis using empirically based priors. J R Stat Soc. 2009;172(1):119–36.
Giraudeau B, Higgins JP, Tavernier E, Trinquart L. Sample size calculation for meta-epidemiological studies. Stat Med. 2016;35(2):239–50.
Herbert RD. Controversy and debate on meta-epidemiology. Paper 2: meta-epidemiological studies of bias may themselves be biased. J Clin Epidemiol. 2020;123:127–30.
Moustgaard H, Jones HE, Savović J, Clayton GL, Sterne JA, Higgins JP, Hróbjartsson A. Ten questions to consider when interpreting results of a meta-epidemiological study-the MetaBLIND study as a case. Res Synth Methods. 2020;11(2):260–74.
Lunny C, Pieper D, Thabet P, Kanji S. Managing overlap of primary study results across systematic reviews: practical considerations for authors of overviews of reviews. BMC Med Res Methodol. 2021;21(1):140.
Sterne JA, Juni P, Schulz KF, Altman DG, Bartlett C, Egger M. Statistical methods for assessing the influence of study characteristics on treatment effects in “meta-epidemiological” research. Stat Med. 2002;21(11):1513–24.
Hróbjartsson A, Thomsen AS, Emanuelsson F, Tendal B, Hilden J, Boutron I, Ravaud P, Brorson S. Observer bias in randomized clinical trials with measurement scale outcomes: a systematic review of trials with both blinded and nonblinded assessors. Can Med Assoc J. 2013;185(4):201–11.
Page MJ: Confounding and other concerns in meta-epidemiological studies of bias. J Clin Epidemiol. 2020;123:133–4.
Schulz KF, Altman DG, Moher D. CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. BMJ (Clinical research ed). 2010;340: c332.
Alba AC, Alexander PE, Chang J, MacIsaac J, DeFry S, Guyatt GH. High statistical heterogeneity is more frequent in meta-analysis of continuous than binary outcomes. J Clin Epidemiol. 2016;70:129–35.
Puljak L. Research-on-research studies or methodological studies are primary research. J Clin Epidemiol. 2019;112:95.
Takwoingi Y, Leeflang MM, Deeks JJ. Empirical evidence of the importance of comparative studies of diagnostic test accuracy. Ann Intern Med. 2013;158(7):544–54.
van Enst WA, Scholten RJ, Whiting P, Zwinderman AH, Hooft L. Meta-epidemiologic analysis indicates that MEDLINE searches are sufficient for diagnostic test accuracy systematic reviews. J Clin Epidemiol. 2014;67(11):1192–9.
Crowley RJ, Tan YJ, Ioannidis JPA. Empirical assessment of bias in machine learning diagnostic test accuracy studies. J Am Med Inform Assoc: JAMIA. 2020;27(7):1092–101.
Tzoulaki I, Siontis KC, Ioannidis JP. Prognostic effect size of cardiovascular biomarkers in datasets from observational studies versus randomised trials: meta-epidemiology study. BMJ (Clinical research ed). 2011;343: d6829.
Lu VM, Phan K, Yin JXM, McDonald KL. Older studies can underestimate prognosis of glioblastoma biomarker in meta-analyses: a meta-epidemiological study for study-level effect in the current literature. J Neurooncol. 2018;139(2):231–8.
Damen JAAG, Debray TPA, Pajouheshnia R, Reitsma JB, Scholten RJPM, Moons KGM, Hooft L. Empirical evidence of the impact of study characteristics on the performance of prediction models: a meta-epidemiological study. BMJ Open. 2019;9(4):e026160–e026160.
This research was supported by the National Natural Science Foundation of China (81973709), Hunan Nature Science Foundation (2019JJ40348) and the High-level Talents Introduction Plan from Central South University (502045003),
Ethics approval and consent to participate
This article does not contain any studies with human participants or animals performed by any of the authors.
Consent for publication
For this type of study, formal consent is not required.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix 1. Search strategy.
Appendix 2. Bibliographical characteristics.
Appendix 3. Methodological quality of meta-epidemiological (ME) studies.
Appendix 4. List of included 80 meta-epidemiological studies.
Appendix 5. List of excluded meta-epidemiological (ME) studies based on full text with reasons.
Appendix 6. Number of meta-epidemiological studies on trial-level characteristics related to treatment effect estimates published by year.
Appendix 7. Main characteristics of 80 meta-epidemiological (ME) studies.
Appendix 8. Trial-level characteristics evaluated in 80 meta-epidemiological (ME) studies by chroNological order.
Appendix 9. Details on the subgroup analyses in 48 meta-epidemiological (ME) studies.
Appendix 10. Associations between treatment effect estimates and other trial-level characteristics for binary outcome.
Appendix 11. Associations between treatment effect estimates and trial-level characteristics according to different subgroup analyses.
Appendix 12. Results of additional subgroup analyses.
About this article
Cite this article
Wang, H., Song, J., Lin, Y. et al. Trial-level characteristics associate with treatment effect estimates: a systematic review of meta-epidemiological studies. BMC Med Res Methodol 22, 171 (2022). https://doi.org/10.1186/s12874-022-01650-5