Updating a systematic review – what difference did it make? Case study of nicotine replacement therapy

Aims To examine the effect of updating a systematic review of nicotine replacement therapy on its contents and conclusions. Methods We examined the effects of regular updating of a systematic review of nicotine replacement therapy for smoking cessation. We considered two outcomes. First, we assessed the effect of adding new data to meta-analyses, comparing results in 2000 with the results in 1994. Second, we assessed qualitatively the ways inwhich the nature of the questions addressed by the review had changed between the two dates. For the first outcome, we compared the number of trials, the pooled estimate of effect using the odds ratio, and the results of pre-specified subgroup analyses, for nicotine gum and patch separately. Using a test for interaction, we assessed whether differences between estimates were statistically significant. Results There were ten new trials of nicotine gum between 1994 and 2000, and the meta-analytic effect changed little. For the nicotine patch the number of trials increased from 9 to 30, and the meta-analytic effect fell from 2.07 (95% CI 1.64 – 2.62) to 1.73 (95% CI 1.56 – 1.93). Apparent differences in relative effect in sub-groups found in 1994 were not found in 2000. The updated systematic review addressed a number of questions not identified in the original version. Conclusions Updating the meta-analyses lead to a more precise estimate of the likely effect of the nicotine patch, but the clinical message was unchanged. Further placebo controlled NRT trials are not likely to add to the evidence base. It is questionable whether updating the meta-analyses to include them is worthwhile. The content of the systematic review has, however, changed, with the addition of data addressing questions not considered in the original review. There is a tension between the principle of identifying the important questions prior to conducting a review, and keeping the review up to date as primary research identifies new avenues of enquiry.


Background
There have been over 90 trials of nicotine replacement therapy for helping people to stop smoking. The most re-liable estimate of the effectiveness of a treatment comes from considering all the available evidence. Meta-analysis of clinical trials provides a quantitative estimate of the size of the treatment effect derived from all the patients studied. It also has the potential to explore, through pre-specified sub-group analysis, whether effectiveness varies by other clinical variables, such as setting of care or patient population.
Previous research has shown that updating can affect both the direction and the precision of the estimate of treatment effects. [1] Clinically, whether a treatment is considered effective can depend crucially on updating meta-analysis as new evidence becomes available. [2] However, there are questions about updating that remain unanswered. In particular, is there a point in the research process where updating can add nothing further and should be abandoned? Since 1996, The Cochrane Tobacco Addiction review group has annually updated a systematic review of nicotine replacement therapy for smoking cessation, first published in 1994. [3,4] Updating consumes significant resources, and we wished to determine what we had achieved by regular updating.

Our objectives were
To describe changes in the number and type of trials included in the review To determine whether updating lead to statistically significant changes in estimates of the effectiveness of NRT.
To determine whether updating lead to statistically significant changes in estimates of effectiveness by intensity of behavioural support, and by setting of treatment. To determine whether the relative effects of treatment in different versions of the review were affected by differences in setting of treatment, level of behavioural support or length of follow-up.
To determine how the content of the review had changed measured by changes in the questions addressed.

Methods
We describe the methods of the systematic review elsewhere.(4) In brief, we collect data from randomised trials with at least six months follow-up. We calculate the effects of treatment as odds ratios with values larger than 1.0 indicating that intervention leads to more people stopping smoking. We calculate the odds ratios using the Peto method after testing for heterogeneity. [5] In addition to assessing the effects of NRT compared to placebo, we pre-specify two subgroup analyses. Because the effect of treatment interacts with patient characteristics, particularly motivation, we wished to know whether treatment effects differ by the setting in which they were recruited. Thus we compared the odds ratios in studies recruiting through advertising in the community, in primary care, in smoking cessation clinics or in hospitals. The second sub-group analysis categorises the trials as high or low intensity, depending on the amount of behavioural support given as an adjunct to the pharmacological therapy. In addition, in this review we explored a further sub-group analysis to address the methodological issue of whether the pooled estimates differed for trials with different durations of follow-up. In this study we compared the results of the meta-analysis in 1994 and 2000. We assessed the difference between estimates in subgroups of trials using a Z-test for interaction. For each sub-group, we calculated the standard error of the log odds from the pooled odds ratio and 95% confidence interval. We then calculated the standard error of the difference of the two log odds from the square root of the sum of the squared standard errors. The test statistic was then the difference in the log odds divided by its standard error. Figure 1 shows the number of studies comparing each type of NRT with a placebo/no NRT control, and studies addressing dose, type or combination therapy, by two year period. Most research reported since 1994 has been on the nicotine patch. Tables 1 & 2 compares the metaanalytic estimates in 1994 with those in 2000 for nicotine gum and nicotine patch.

Figure 1
Number of nicotine replacement therapy trials included in review, 1994-2000 Overall quit rates were highest in participants who received both high intensity support and nicotine gum. In 1994 the estimate of the relative efficacy of gum compared to control was higher with minimal support than with intensive support, although the difference was not statistically significant. The results in 2000 show little difference. (Table 1b).

Nicotine patch
In 1994 the odds ratio for nicotine patch was 2.07 (95% CI 1.64 -2.62). In 2000 the odds ratio was 1.73 (95% CI 1.56 -1.93). (Table 2a) In 1994 the meta-analysis included data from 2,213 participants in 9 trials. In 2000 the meta-analysis included almost 16,000 patients in 30 trials. In 2000 the largest 6 trials, whether by size or weight in the meta-analysis, contained almost 60% of the participants. Only one of the trials available in 1994 is amongst  We considered characteristics of the trials that might explain some of the change in the overall estimate. Amongst the explanations we considered a priori were that relative efficacy might differ by setting of care (and hence case mix) and level of support [7], or according to length of follow-up. Of the studies available in 1994, all but three recruited community volunteers. Two were in primary care, and one was in hospital patients. Since then, large studies have been done in primary care [8,9] and in a Veterans Administration Medical Centre [10]. In 2000 the estimated effect size for primary care trials alone was 1.47 (95% CI 1.18 -1.83), which was lower than that in community volunteers (1.86, 95% CI 1.62 -2.14). This difference is not statistically significant (P = 0.075). There is evidence from control quit rates that the level of motivation to quit was lower in primary care settings, but there is little evidence that the characteristics of the participants has changed. Most trials recruit heavier smokers irrespective of setting.
In 1994 the sub group analysis distinguishing trials with high and low intensity support showed a non-significant trend towards the patch having a larger relative benefit when used in conjunction with low intensity support. There were four trials in the low intensity support sub group and five in the high intensity group. The 2000 review, with 12 low intensity and 18 high intensity trials shows no evidence for such a difference. (Table 2b)

Length of follow-up
In the 1994 meta-analysis, three nicotine patch trials, containing over 50% of the total participants, reported cessation rates at 6 months, and six trials reported 12 month follow-up data. The odds ratios for each subgroup were similar, and overall heterogeneity was low, so all were pooled. To investigate whether this was still appropriate we compared subgroups by length of follow-up. For the 19 trials that now report 12 month outcomes (66% of the total) the pooled estimate was lower (OR 1.62, 95% CI 1.42 -1.84) than for the 11 with 6 month outcomes (OR 1.96, 95% CI 1.65-2.34). However, the difference between these odds ratios was not statistically significant (p = 0.085). Heterogeneity was low amongst the 6 month trials, but was statistically significant in the 12 month trial subgroup (p = 0.044). (Table 2c)

Qualitative changes in the systematic review
In 1994 the meta-analysis addressed two central questions: 1. The effectiveness of any type of nicotine replacement therapy compared to placebo 2. The effect of different doses estimated by indirect comparison. In 2000, the meta-analysis examined, in addition, the effect of new forms of NRT (nasal spray, inhalator, tablet), the effect of combinations of NRT (for example, patch plus inhaler versus patch alone), direct comparisons of different doses of NRT, and NRT versus non-nicotine pharmacotherapies such as the antidepressant, bupropion. Only one trial has so far addressed this question. [11]

Discussion
Since 1994, a large number of trials of NRT have been reported. Meta-analytic estimates of the effects of the nicotine patch were lower in 2000 than in 1994, but there was no difference in statistical significance. The clinical message, that NRT can help dependent smokers to quit, is unchanged. Sub-group analyses, even when pre-specified, can lead to over-interpretation of findings of borderline significance. The updated meta-analysis shows that this was the explanation for possible differences in relative effect by level of behavioural support for gum and patch suggested by sub-group analysis in 1994. Absolute quit rates differ substantially according to such factors as baseline motivation of the population studied and the intensity of adjunctive behavioural support offered. The updated review provides further evidence that the relative effectiveness of treatment does not vary significantly among these sub-groups. Our other exploratory analyses similarly failed to detect a significant difference in relative effects due to changing case mix or length of follow-up. This is in keeping with evidence from other conditions that relative effects are often constant across different levels of baseline risk [12].
One reason for lower estimates of effectiveness over time may be a form of publication bias in which trials with negative, or less encouraging results, take longer to reach publication. [13] We were unable to explore this because few trials specified the date when they started recruitment. However, chance provides an adequate explanation for the lower estimates of the effectiveness of NRT in 2000, without the need to invoke systematic bias.
The content of the review has changed significantly, now including data on issues that were not included in the 1994 review, and in some cases, not flagged by the review as areas to be addressed in the future. This presents a dilemma for research synthesis. One principle of the systematic review is that it should work to a protocol, recognising the possibility of bias if the review takes its structure from the available evidence rather than vice versa. On the other hand, clinical practice and research move on, sometimes in unanticipated ways. Updated reviews that ignore new questions and data risk being perceived as irrelevant and out of touch. Our compromise solution has been to review the protocol annually prior to updating the review. Recognising the risk of bias, we aim to be particularly cautious in drawing conclusions about new questions based on emerging and incomplete primary evidence.

Conclusions
What did updating achieve in this example? The main effect of accumulating further evidence from placebo-controlled trials of nicotine replacement was to reduce the risk of drawing spurious conclusions from sub-group analyses. Overall estimates of the effect of nicotine replacement in helping people to give up smoking have not changed significantly. The huge investment in placebocontrolled trials in the 1990's therefore added little to overall knowledge. This raises two separate issues. First, further placebo-controlled trials should not be funded. The relative effects of NRT differ little by patient characteristics, and the need to demonstrate efficacy in particular subgroups (for example, in patients with chronic disease) should be challenged. The second, and more difficult issue, is whether systematic reviews should continue to be updated with studies that will not change the cumulative conclusions.
There are examples, of apparently robust pooled estimates that have been challenged by results from very large randomised trials. [14] A policy of not updating reviews would violate the principle that systematic reviews should consider all the available evidence, and would be a form of bias by date of publication. On the other hand, updating consumes resources, which are wasted if the conclusions do not change. Either way, greater attention to cumulative knowledge in determining the questions to be addressed in primary research is important to ensure that neither primary nor secondary research wastes resources on questions that have been adequately answered.