If a treatment consistently has no effect at all, the scale of the meta-analysis probably matters little. However, if a treatment does have an effect, the scale may substantially influence the interpretation of the findings and the communication about them. Therefore this analysis selected 2 trials that found a significant benefit from zinc lozenges and for which IPD was available. While the goal of this study was not to estimate the overall effect of zinc lozenges on common cold duration, the findings in these 2 trials are consistent with the relative effects (i.e., the % decrease) found in 4 other trials on zinc lozenges [12, 19].
When the effect of zinc lozenges on cold duration was estimated on the absolute scale, i.e., as the number of days by which the colds became shorter, applying the effect estimates to the placebo group cold distribution yielded impossible, negative or zero, predicted durations for colds. In addition, the distributions remained wide compared with the distributions in the actual zinc lozenge groups. In contrast, when the relative effect estimate was applied to the placebo group cold distributions, no impossible common cold durations were predicted, and the cold distributions became similar to those of the zinc lozenge groups.
Calculation of the relative effect on common cold duration is conceptually consistent with the RR calculated by Cox regression. Nevertheless, the estimates should not be expected to be reciprocals since the RR depends on the shapes of the curves, which can differ substantially even though the mean durations remain invariant. Compared with the t-test, survival analysis is a superior way to analyse the recovery rate at the level of individual studies (Figs. 2 and 4), since it allows visual inspection and formal analysis of time-dependent changes in the treatment effect, and is not hampered by censored data or outlier patients who have exceptionally long colds. In contrast, outliers lead to inflation of the SD estimates in the t-test and may decrease the power to demonstrate an effect. However, since only the means and SDs of the study groups are usually published, survival analysis is rarely an option in the meta-analysis of disease duration. Therefore, in the meta-analyses of time data the usual question is about whether the absolute (i.e., MD) scale or the relative scale is more reasonable for the t-test.
The selection of scale is important in 2 respects. First, the scale is important when several studies are pooled in a meta-analysis, since one scale can lead to less heterogeneity than another, indicating better capture of the effect by the former scale [3]. Second, the scale influences the communication about the findings of single trials and of meta-analyses.
There has been substantial variation in the mean duration of untreated (placebo group) colds between trials. In our Cochrane review on vitamin C and the common cold, the shortest mean duration of placebo group colds in trials with children was 2.8 days, whereas the longest was 14 days [20, 21]. Because the untreated colds differ so greatly, a 1-day effect has a very different meaning in those two trials, even though the nominal value of the effect is the same. Furthermore, with an effective treatment, the 14-day colds might be shortened by a week, whereas such an effect does not exist on the 2.8-day scale of the former trial. Thus, the absolute scales of those two trials are incompatible. Therefore, we have used the relative scale in our Cochrane review on vitamin C and the common cold since 2004 [8, 20]. There has also been a substantial variation in cold durations in the placebo groups of 11 zinc lozenge studies, from 5 to 10 days [12].
There is evidence from studies on vitamin C and the common cold that heterogeneity in treatment effect may be lower on the relative scale. In 14 trials on vitamin C for children, the relative scale (i.e., the %-scale) pooling led to I2 = 27% (P = 0.17) for the heterogeneity in the treatment effects, whereas heterogeneity in the days-scale was I2 = 46% (P = 0.03). Less heterogeneity may lead to stronger evidence of treatment effect and, as expected, the pooled treatment effect on the relative scale was stronger (Z = 4.04) than on the absolute scale (Z = 3.11) [21]. An earlier meta-analysis of vitamin C and the common cold studies also pointed out that the relative scale led to a stronger evidence of treatment effect (P = 0.001) than the absolute scale (P = 0.01) [22]. These differences indicate that the relative scale may capture the effects of treatments on common cold duration better.
The selection of scale is also important in the communication of the findings. In the placebo groups of the included trials, the duration of the common colds ranged from 2 to 19 days in the Mossad [9] trial, and from 2 to 15 days in the Petrus [10] trial. Such a great variation in the durations of untreated colds should be taken into account in communication. The 43 and 25% effects are applicable over the whole range of the untreated cold durations. In contrast, claims that zinc shortens the duration of colds by 4.0 or 1.77 days according to the two trials analyzed (Table 1), or by 1.65 or 1.03 days according to two meta-analyses [13, 14], have a very different meaning depending on whether the assumed untreated cold episode might last for 2 days or 2 weeks. If only one type of estimate is used in the communication, the relative effect appears to be much more informative since it is applicable to the entire range of potential episode durations. Nevertheless, both measures may be shown in parallel.
The SMD scale is a third approach that has been used to estimate the magnitude of treatment effect on continuous outcomes [4, 5]. This normalizes the observations so that one unit on the scale corresponds to one unit of SD in each trial included in the meta-analysis. Such a scale is confusing for an ordinary reader. For example, the 2011 Cochrane review on zinc and the common cold stated in its abstract that the “intake of zinc is associated with a significant reduction in the duration (standardised mean difference (SMD) -0.97)” [23]. Reporting should always show the unit of measurement, and this sentence should have been written more accurately as: “zinc shortened the duration of colds by 0.97 SD units”. However, such accurate reporting would have revealed the main problem of the SMD scale: what does the SD unit mean in practical terms. Most physicians and patients can consider whether 42 or 25% is a small or a large effect, but few of them can form their own opinion about whether an effect of 0.97 SD units is small or large. In this respect, the relative scale is far superior in the communication of findings to physicians and patients, since they have long-term familiarity with the percentage effects. The difficulty of communicating the SMD findings to patients has been pointed out [5, 24].
The SMD scale is extensively used. However, its usage does not seem to originate from biological or statistical considerations, but from the fact that, along with the MD scale, it is the only option that is available in the popular statistical software for meta-analysis, such as the RevMan of Cochrane Collaboration [4]. Thus, the wide availability of the SMD option guides researchers to use it without them considering the biological issues properly, or the difficulties in communicating the findings on that scale [5, 24].
Based on 143 meta-analyses on continuous outcomes, Friedrich et al. concluded that there was less heterogeneity in the meta-analyses when they were carried out on the relative scale [5]. Although their findings support the preference for the relative scale, they did not consider the diversity of the kinds of outcomes that are measured by continuous outcomes. It seems evident that the relative scale is superior for some outcomes such as the duration of the common cold and other diseases, whereas the absolute scale may be superior for some other outcomes. Therefore diverse continuous outcomes should not be combined into a uniform mass, so that all of them are calculated on the relative scale and compared with all of them on the absolute scale.
Essential requirements for using the relative scale in a meta-analysis appear to be that the measurements can be transformed so that the control group means are 100%, and that there is a relevant and unambiguous target level of 0%. However, there are many types of continuous outcomes which do not have a relative scale that is relevant. For example, since blood pressure does not have either a reasonable 100% level over various trials, or a reasonable 0% target level, pooling studies and reporting effects on blood pressure on the percentage scale might lead to confusions. The mmHg scale (absolute scale) stratified by the pre-treatment mmHg level may be preferable. Another example where percentage scale would lead to confusions is measuring body temperature. If a patient has fever, it is more informative to describe how many degrees the fever was reduced, rather than describing the relative decrease in body temperature on the Kelvin scale.
The duration of the common cold was used as a model of continuous outcomes. The common cold is by itself a clinically relevant topic as reflected, for example, by the existence of 18 Cochrane reviews in which the title includes the term [25]. Furthermore, it seems evident that the findings of this study apply to many other continuous outcomes that measure time, such as the duration of other diseases, and the duration of hospital stay and intensive care unit stay, etc.
In this study, the transformation to the relative scale was done by dividing the mean and SD values by the placebo group mean value, which transforms the zinc group mean level to the RoM and keeps the ratios of SDs and means identical with their ratios on the absolute scale. This approach is transparent but more conservative than the Taylor series approach [5] and the Fieller approach [17], see Additional file 3. These approaches should be compared to find out which is the most useful in meta-analyses.