Data
The Nordic Cochrane Centre provided the content of the first issue from 2008 of the CDSR. The database includes meta-analyses within reviews which have been classified previously by outcome type, medical specialty and types of interventions included in the pairwise comparisons [12]. The database did not record whether data type was time-to-event; however, based on the outcome classification we were able to identify (using words such as “survival”, “death”, “fatality”) three sets of time-to-event meta-analyses:
-
“binary”: Those with outcome classification “all-cause mortality” where the information recorded was based only on the number of events and participants per arm;
-
“OEV”: Those with outcome classifications “overall survival” and “progression/disease free survival” where the information recorded was based on “binary” data in addition to log-rank “O-E” and “V” statistics”; these were originally analysed as HRs in the RevMan software;
-
Those with estimated log HR and its standard error. These were removed from further analyses since there was no available information on the number of events and participants per arm and therefore no binary data meta-analysis could be conducted.
Therefore, we identified two subsets of time-to-event meta-analyses: those with binary summaries, and those with binary summaries in addition to OEV data; we analysed each outcome per dataset separately to assess whether differences exist due to different characteristics of the outcomes. We also examined whether the information obtained from “OEV” data was based on aggregate data or IPD by examining the individual Cochrane reviews.
Eligibility Criteria
RMT (for “binary” data) and TS (for “OEV” data) initially extracted these data and conducted cleaning including examination of the outcome classification; TS repeated the “binary” data extraction to confirm the information obtained were accurate and RMT confirmed the choice of included meta-analyses obtained from “OEV” data extraction. Both datasets could contribute more than one meta-analysis per Cochrane review. RMT and TS identified 46 misclassifications due to disagreement with the original outcome classification as listed in the datasets, conflicting information in the database or unavailability of the correct version of the Cochrane review. We excluded 1,284 studies including double zero events, since they do not contribute to the meta-analysis results [12, 13]. We removed another 359 meta-analyses including fewer than 3 studies because some of the models applied below (i.e. generalised linear mixed models) will be affected by estimation issues and inevitable failures using small numbers of studies [14]; hence we wanted to make fair comparisons between the models applied. Derivation of the analysis sample is provided in Fig. 1.
Descriptive statistics
We describe the number of studies per meta-analysis, number of events and study size by the median and interquartile range. We also identify the number of medical specialities, and median number of events (and interquartile range) per medical specialty.
Model description for “binary” data
We used the following meta-analysis models to analyse the data on the OR or HR scale. The first was a model proposed for “binary” data (assuming a binomial likelihood with a logit link) which is based only on the number of patients and number of events which occurred. Interpretation for the treatment effect is conducted in terms of the logarithm of an OR.
In the second approach, we modelled the binary data using a normal approximation to binomial likelihood with a complementary log–log link (clog-log), where treatment effect interpretation was based on the logarithm of a HR. This method is also based only on the number of patients and events which occurred, and ignores censoring and the time element; however it is closely related to continuous-time models, has a built-in proportional hazards assumption, and therefore has important application in survival analysis [6].
Fitting two-stage random-effects models for “binary” data
Prior to fitting the two-stage random-effects models, study arms with zero events were identified for the “binary” data. For 771 studies, a “treatment arm” continuity correction was applied as proposed by Sweeting et al. [15] and was constrained to sum to one as this ensures that the same amount of information is added to each study.
Let \(i=\mathrm{1,2},\dots ,n\) denote the study. The estimated log odds and log hazard ratios were given by:
$$y_i=\left\{\begin{array}{lll}\log{\left(\frac{{\mathrm A}_{\mathrm i}}{{\mathrm B}_{\mathrm i}}\right)}-\log{\left(\frac{{\mathrm C}_{\mathrm i}}{{\mathrm D}_{\mathrm i}}\right)} \mathrm {for}\ \mathrm{ORs} & \qquad(1) \\{\log}{\left[-\log{\left(1-P_{Ti}\right)}\right]}-\log{\left[-\log{\left(1-P_{Ci}\right)}\right]}\mathrm {for}\ \mathrm {HRs} & \qquad(2)\end{array}\right.$$
where \({\mathrm{A}}_{\mathrm{i}},{\mathrm{C}}_{\mathrm{i}}\) represented number of events, \({\mathrm{B}}_{\mathrm{i}},{\mathrm{D}}_{\mathrm{i}}\) represented number of non-events in the treatment and control groups respectively, \({P}_{Ti}=\frac{{\mathrm{A}}_{\mathrm{i}}}{{\mathrm{A}}_{\mathrm{i}}+{\mathrm{B}}_{\mathrm{i}}}\) was the proportion of events on the treatment arm of the \({i}^{th}\) study, and \({P}_{Ci}=\frac{{\mathrm{C}}_{\mathrm{i}}}{{\mathrm{C}}_{\mathrm{i}}+{\mathrm{D}}_{\mathrm{i}}}\) was the proportion of events on the control arm of the \({i}^{th}\) study.
The corresponding variances were given by:
$$s_i^2\;=\;\left\{\begin{array}{lll}\frac1{A_i}\;+\;\frac1{B_i}\;+\;\frac1{C_i}\;+\;\frac1{D_i}\;for\;ORs & \qquad(3) \\ \left(\frac1{\log\;\left(1-P_{Ti}\right)\ast\left(P_{Ti}-1\right)}\right)^2\;\ast\;\left(\frac{P_{Ti\ast}\left(1-P_{Ti}\right)}{A_{\mathrm i}\;+\;B_{\mathrm i}}\right)\;+\;\left(\frac1{\log\left(1-P_{Ci}\right)\ast\left(P_{Ci}\;-\;1\right)}\right)^2\;\ast\;\left(\frac{P_{Ci}\ast\left(1-P_{Ci}\right)}{{\text{C}}_\text{i}\;+\;{\text{D}}_\text{i}}\right)\;for\;HRs & \qquad(4)\end{array}\right.$$
Equations 2 and 4 provided a HR estimate via the use of the complementary log–log link considered as a useful link function for the discrete-time hazards models as recommended by Hedeker et al. [7] and Singer et al. [6]. We estimated the study-specific log odds ratios or log hazard ratios, \({y}_{i}\) and their within-study variances \({s}_{i}^{2}\) as shown above and fitted a standard two-stage random-effects model to these. Additionally, we obtained the \({I}^{2}\) statistic from the fitted models as follows:
$${I}^{2}=\frac{{\widehat{\tau }}^{2}}{{\widehat{\tau }}^{2}+{\widehat{\sigma }}^{2}}$$
where \({\tau }^{2}\) denotes the variance of the underlying true effects across studies and \({\sigma }^{2}\) the typical within-study variance.
To avoid downward bias in the variance components estimates, we used the REML estimator for model implementation [16]. The models were implemented via the “rma.uni” command from “metafor” package in R. We also fitted one-stage random-effects models for “binary” data. The methods related to one-stage meta-analysis models and code is available in Additional file 1.
Model description for “OEV” data
For “OEV” data, the “O-E” and “V” statistics were available in the Cochrane database alongside the number of patients and events. These data came either from published reports or from IPD; TS examined the individual reviews from the Cochrane database and assessed the data origin. Since there were more available information for these data the following three models were applied, using only two-stage meta-analysis models.
Similarly to “binary” data, we initially analysed the “OEV” data as “binary” and modelled them as described in detail in the preceding section. We also used the log-rank Observed—Expected events (O-E) and the log-rank Variance (V) statistics calculated previously from the number of events and the individual times to event on each research arm of the trial; we used the log-rank approach [17] in order to obtain another type of HR estimate. We used random-effects models to analyse the data throughout, including between-study heterogeneity to account for variation across studies.
Fitting two-stage random-effects models for “OEV” data
Similarly to the “binary” data, the estimated log odds and log hazard ratios were given by Eqs. 1 and 2 for the binary summaries while the “O-E” and “V” statistics were used as follows:
$$y_i=\frac{logrank\ Observed-Expected\ events\ (O-E)}{logrank\ Variance\ (V)}\ for\ HRs$$
(5)
The corresponding variances were given by Eqs. 3 and 4 for binary summaries while for “O-E” and “V” statistics as follows:
$$s_i^2=\frac{1}{logrank\ Variance\ (V)}\ for\ HRs$$
(6)
where \(V\) denotes the variance of the logrank statistic. We used the REML estimator for model implementation [16] and the models were implemented via the “rma.uni” command from “metafor” package in R.
Model comparison for “binary” data
The following model comparisons were performed. For the “binary” data set, we examined whether the results from analysing survival data as binary on an OR scale are similar to results from analysing on the HR scale using the clog-log link, both under two-stage and one-stage models. For presentation purposes, we present only comparisons of the results under two-stage models in the main paper (and for one-stage models in the Additional file 1) in order to assess the discrepancies between the model using the logit link and the model using the complementary log–log link.
First, we examined the proportion of significant and non-significant meta-analytic pooled effect estimates under the different scales used (OR vs HR scale); we identified the number of meta-analyses which were significant under one scale and non-significant under the other at a two-sided 5% level of significance.
Bland–Altman plots with associated 95% limits of agreement were constructed, with the aim of facilitating interpretation of results and producing fair comparisons between the two scales [18]. In order to create these plots, results were standardised by dividing the logarithm of the estimate by its standard error. Plots were produced for the standardised treatment effect estimates and for the \({I}^{2}\) statistics. \({I}^{2}\) represents the percentage of variability that is due to between-study heterogeneity rather than chance; \({I}^{2}\) values range from 0 to 100%. This measure was chosen for model comparison as it enables us to compare results directly between the two scales used. The variance of underlying true effects across studies (\({\tau }^{2}\)) was not used as it does not allow direct comparison between different outcome measures.
We identified “outliers” as meta-analyses outside the 95% limits of agreement, and we examined their characteristics. The meta-analysis characteristics we examined were the following:
-
between-scale differences in the magnitude of the pooled treatment effect estimate and its 95% confidence intervals
-
the levels of within-study standard error and between-study heterogeneity and study weights in the meta-analysis
-
study-specific event probabilities and baseline risk
We summarised these differences by meta-analysis and reported those characteristics which were mostly associated with substantial differences between OR pooled effect estimates and corresponding HR pooled effect estimates.
Model comparison for “OEV” data
For the “OEV” data set, comparisons on overall and progression disease free survival outcomes were conducted separately; this was because differences between these outcomes might be observed in the presence of different disease severities, and therefore this would be associated with different length of follow-up and risk of the outcome.
For both outcomes, we performed comparisons by examining the differences between analysing the data as binary on an OR scale, analysing the data as binary using the clog-log link on a HR scale, or analysing the data using the “O-E” and “V” statistics on a HR scale. We assessed whether the differences observed from analysing the data as binary on an OR scale could be reduced by the use of the clog-log link. We present only comparisons of the results under two-stage models since there were no available IPD to perform comparisons under one-stage models.
Similarly to “binary” data, we examined the proportion of significant and non-significant meta-analytic pooled effect estimates under the different scales used and identified the number of meta-analyses which were significant under one scale and non-significant under the other. We created Bland–Altman plots for the standardised treatment effect estimates and for the \({I}^{2}\) statistics to explore the agreement among the methods producing fair comparisons between the two scales [18]. Meta-analyses outside the 95% limits of agreement were examined for their characteristics.