Calculation of NNTs in RCTs with timetoevent outcomes: A literature review
 Mandy Hildebrandt^{1, 2}Email author,
 Elke Vervölgyi^{1} and
 Ralf Bender^{1, 3}
DOI: 10.1186/14712288921
© Hildebrandt et al; licensee BioMed Central Ltd. 2009
Received: 30 January 2008
Accepted: 20 March 2009
Published: 20 March 2009
Abstract
Background
The number needed to treat (NNT) is a wellknown effect measure for reporting the results of clinical trials. In the case of timetoevent outcomes, the calculation of NNTs is more difficult than in the case of binary data. The frequency of using NNTs to report results of randomised controlled trials (RCT) investigating timetoevent outcomes and the adequacy of the applied calculation methods are unknown.
Methods
We searched in PubMed for RCTs with parallel group design and individual randomisation, published in four frequently cited journals between 2003 and 2005. We evaluated the type of outcome, the frequency of reporting NNTs with corresponding confidence intervals, and assessed the adequacy of the methods used to calculate NNTs in the case of timetoevent outcomes.
Results
The search resulted in 734 eligible RCTs. Of these, 373 RCTs investigated timetoevent outcomes and 361 analyzed binary data. In total, 62 articles reported NNTs (34 articles with timetoevent outcomes, 28 articles with binary outcomes). Of the 34 articles reporting NNTs derived from timetoevent outcomes, only 17 applied an appropriate calculation method. Of the 62 articles reporting NNTs, only 21 articles presented corresponding confidence intervals.
Conclusion
The NNT is used as effect measure to present the results from RCTs with binary and timetoevent outcomes in the current medical literature. In the case of timetoevent data incorrect methods were frequently applied. Confidence intervals for NNTs were given in one third of the NNT reporting articles only. In summary, there is much room for improvement in the application of NNTs to present results of RCTs, especially where the outcome is time to an event.
Background
The concept of the number needed to treat (NNT) was proposed by Laupacis et al. [1] in 1988 to provide clinicians with a useful measure of treatment benefit. It represents the average number of patients who must be treated to prevent one adverse outcome within a certain duration of followup time, and is calculated by inverting the absolute risk reduction (ARR) [1, 2]. There is an intensive discussion about the comprehensibility and the usefulness of NNTs in the scientific literature [3–11]. The main mathematical arguments against the use of NNTs, namely undesirable distributional properties and that NNT is undefined if ARR = 0, are justified. However, mathematical arguments lose their importance when NNT is considered just as a way to translate research results to patients, not as a tool for statistical computations [3, 12]. It is also questioned by several authors whether NNTs are intuitively meaningful and helpful for physicians and patients [7–10]. Nevertheless, in the past years, the number needed to treat has become a wellknown effect measure and is conventionally applied in randomised controlled trials (RCTs) with a binary outcome where the duration of followup time is fixed and the time to event plays no role or is ignored [12]. In 2001, the explanatory document of the Consolidated Standards of Reporting Trials (CONSORT) statement noted that NNTs could be helpful in expressing results for both binary and survival time data [13].
In RCTs with a binary outcome the calculation of NNTs is based on simple proportions referring to the fixed duration of followup (i.e. rates from a 2 × 2 table) [1, 2, 12]. In the case of timetoevent outcomes, the calculation of the number needed to treat is more difficult because varying followup times and censoring have to be taken into account [12].
Two basic methods have been proposed to calculate the number needed to treat in this situation. Altman & Andersen [14] proposed to calculate NNTs for one or several fixed time points based on survival probabilities estimated by the KaplanMeier survival curve or the Cox regression model. Due to the dependency on time, ARRs and NNTs refer to specific time points. A time specific NNT(t) is interpreted as the average number of patients needed to be treated to observe one eventfree patient more in the treatment group than in the control group at time point t.
A second method was proposed by Lubsen et al. [15] and Mayne et al. [16], independently of each other. In both papers, it was proposed to use the reciprocal of the hazard difference rather than the risk difference to estimate NNTs for timetoevent outcomes. An argument for using hazards was that a distinction has to be made between trials of acute conditions with treatments of a short fixed duration and trials of chronic diseases and continuous treatments [15]. It was argued that in the case of chronic diseases and continuous treatments the calculation of NNTs by inverting the hazard difference would be more appropriate because an expression in units of persontime is required [15]. However, the NNT is an effect measure to quantify the impact of a treatment in terms of patient numbers that have to be treated to avoid one event within a certain length of followup time. The reciprocal of the hazard difference results in the average number of patient years (instead of patients) needed to observe one event less in the treatment group than in the control group. However, this explanation is only valid in the case of a constant hazard difference, i.e. if the distribution of the survival times is given by the exponential distribution [16] or the linear hazard rate distribution [17]. For all other survival time distributions the hazard difference and its reciprocal are time dependent. Moreover, the hazard difference is only a valid approximation of the risk difference if event rates are low, for instance less than 5% [16, 18]. In all other cases the use of hazards to calculate NNTs is misleading. Therefore, in this paper the NNT is – as usual – considered as effect measure comparing the risks of two groups (treatment versus control) for a specific length of followup time in terms of patient numbers having to be treated to expect an avoided event in one patient.
Nuovo et al. [19] investigated the frequency of reporting NNTs in RCTs published in leading medical journals in the years 1989, 1992, 1995, and 1998. They found that only about 2% of eligible articles reported NNTs and concluded that this effect measure was underused in the medical literature.
The main objectives of our review are to investigate the frequency of reporting NNTs in RCTs published in leading medical journals in the years 2003–2005 and to assess whether the methods applied for their calculation were appropriate in the case of timetoevent outcomes. We also assessed whether confidence intervals were reported to describe the uncertainty of the estimated NNT measures for both timetoevent and binary outcomes.
Methods
We assessed, whether the methods used to calculate NNTs from timetoevent outcomes were appropriate. According to the methodology described in the literature [12, 14–16, 18] we considered a method as appropriate if the NNT was calculated either from survival probabilities estimated by means of the KaplanMeier method or the Cox regression model [14] or if it was calculated as the inverse of the hazard difference and both assumptions mentioned above are met (constant hazard difference and low event rates) [15, 16, 18]. When the method to calculate NNTs was not described in the article, we tried to verify the reported NNTs by recalculation from the presented data. The use of an appropriate method to calculate NNTs was possible if the corresponding KaplanMeier survival or incidence curves were presented. In this case we were able to recalculate the NNT as follows. At first we identified the point of time at which the NNT was estimated. If no time point was given we used the latest time point of the KaplanMeier graph. From this time point we draw a vertical line to the top of the graph so that the curves of the treatment arms were crossed. From these cross points we draw horizontal lines to the yaxis and read off the corresponding survival probabilities for the different treatment arms as accurate as possible. These probabilities were then used for NNT calculation. When it was clear that an inappropriate method was used either by statements given in the text or by comparing the presented with the recalculated NNT, the method was classified as "inappropriate", otherwise as "appropriate".
We also assessed whether confidence intervals for the number needed to treat were provided. If the numbers at risk were given together with the KaplanMeier curve or were inferable because of losttofollowup information or a hazard ratio with confidence interval was presented we were able to calculate also a confidence interval for the recalculated NNT by using one of methods proposed by Altman & Andersen [14]. If numbers at risk were given but not exactly for the required time point we used the numbers at risk for the corresponding nearest time point.
Additionally, we investigated the reporting of absolute risk reduction with corresponding confidence interval. To characterise the studies we further evaluated the median sample size of the studies reporting NNTs and whether the outcome for which the NNT was calculated was a primary or secondary endpoint.
Results
Reporting of the number needed to treat (NNT) and corresponding 95% confidence interval (CI) in randomised controlled trials (RCTs) in leading medical journals in the years 2003–2005
No. of articles  

Journal  RCTs  NNT reporting  CI for NNT 
BMJ  90  13  7 
JAMA  199  16  4 
Lancet  190  14  4 
NEJM  255  19  6 
Total  734  62 (8.4%)  21 (33.9%) 
Reporting of the number needed to treat (NNT) and corresponding 95% confidence interval (CI) in randomised controlled trials (RCTs) with timetoevent outcomes in leading medical journals in the years 2003–2005
No. of articles  

Journal  RCTs with timetoevent data  NNT  Appropriate NNT calculation  CI for NNT 
BMJ  17  2  0  0 
JAMA  89  9  4  2 
Lancet  111  10  6  1 
NEJM  156  13  7  3 
Total  373  34 (9.1%)  17 (50%)  6 (17.6%) 
Reported and recalculated NNTs with 95% confidence intervals (CIs) from 17 studies using inappropriate methods to calculate NNTs for timetoevent data
No.  Reported NNT  Reported 95% CI  Recalculated NNT  Recalculated 95% CI  Absolute difference (reported NNT – recalculated NNT) 

1  14    17.5  9.2 – 171.9  3.5 
2  23    18.2  11.1 – 49.6  +4.8 
3  10    14.7  7.8 – 117.6  4.7 
4  40    57.1  NNTB 24.2 to ∞ to NNTH 156.6  17.1 
5  O1: 2.2 O2: 6.1    O1: 2.0 O2: 4.5  O1: 1.7 – 2.7 O2: 2.8 – 15  O1: +0.2 O2: +1.6 
6  TP1: 5–6 TP2: 12  TP1: 3.6 – 11.1 TP2: 6.3 – 74.6  *     
7  138  77 – 641  *     
8  38    *     
9  9  6 – 14  *     
10  O1: 40 O2: 118    O1: 33.3 O2: 100.0  O1: 19.3 – 123 O2: NNTB 46.8 to ∞ to NNTH 741.3  O1: +6.7 O2: +18.0 
11  7.5  4.8 – 14.7  7.1  4.5 – 16.9  +0.4 
12  39    38.2  21.7 – 158.1  +0.8 
13  4.3    4.5  2.6 – 17.9  0.2 
14  5    4.9  3.7 – 7.1  +0.1 
15  30    *     
16  "Slightly more than 6"    *     
17  NNS = 352    325.7  185.4 – 1337.5  +26.3 
To explain the methods of our calculations we present one typical example. One study provided the information "The number needed to treat to prevent 1 cardiovascular event would be 40 patients with IGT over 3.3 years". Additionally, the naive proportions of patients experiencing an event were given as 32/686 in the placebo group and 15/682 in the intervention group. Obviously, the result of NNT = 40 is based upon these naive proportions, because 1/(32/68615/682)≈1/0.025 = 40. However, due to varying followup times and censoring, the naive proportions represent no valid estimates of the corresponding risks at time point 3.3 years, which is only the mean followup time. An adequate approach to estimate the required risks for a specified time point is given by the KaplanMeier method.
We enlarged the KaplanMeier incidence curve given in the paper and determined the corresponding risk estimates at time point 1200 days visually as accurate as possible. We found the risk values 0.0410 and 0.0235 for the placebo and the intervention group, respectively. Thus, the recalculated NNT is given by 1/(0.0410  0.0235) = 1/0.0175 = 57.1 and the reported NNT of 40 is about 30% too low.
In the 62 NNTreporting articles, corresponding confidence intervals were presented in 21 studies (6 of the 34 studies with timetoevent outcomes and 15 of the 28 studies with binary outcomes). Among the 62 NNTreporting articles, 1 article used the term "number needed to screen" (NNS), 2 articles used the terminology "number needed to treat for one patient to benefit" (NNTB) and harm (NNTH), respectively, and 1 article used the term "number needed to harm" (NNH).
The absolute risk reduction was given in 33 (53.2%) of the 62 NNTreporting articles (17 with timetoevent data and 16 with binary data), a corresponding confidence interval for the absolute risk reduction was given in 21 (63.6%) of 33 articles (7 with timetoevent data and 14 with binary data).
Discussion
The number needed to treat is used as effect measure to present the results from randomised controlled trials with binary and timetoevent outcomes. We found that in the case of survival time data incorrect methods were frequently applied. As the explanatory document of the CONSORT statement [13] described the number needed to treat in addition to other effect measures (risk ratio or risk reduction) as helpful for expressing results of both binary and survival time data, appropriate methods are required for the calculation of NNTs also for the situation of timetoevent data. Our finding that 50% of the NNTreporting articles with survival time data used inadequate calculation methods underlines the requirement to point out that special methods based on survival time techniques have to be used to calculate NNTs in this situation. This observed proportion probably underestimates the true proportion because we classified the method to calculate NNTs as "appropriate" if the method used was unclear and the reported NNT equalled the recalculated NNT from survival probabilities. It could be that in fact naive proportions have been used (i.e. an inappropriate method) but the result haphazardly equalled the correct result based upon survival probabilities. Thus, the true proportion of NNTreporting articles with survival time data and inadequate calculation methods may be even higher than the observed proportion of 50%. As the considered journals represent the leading journals in medical research it can be expected that a broader review containing also medical journals of lower rank would lead to even a higher proportion of papers with inadequate NNT calculation.
In this paper we did not judge whether the application of NNTs was helpful or useful in the specific situation. For example, it was argued that in the case of chronic diseases and continuous treatments the calculation of NNTs by inverting the risk differences is not useful because the duration of treatment is not taken into account [15]. We agree that in the case of continuous treatments one should be careful if a costeffectiveness analysis shall be made on the basis of NNTs. The treatment costs depend on the duration of treatment and this is shorter than the followup time for patients having an event before the end of the study. Thus, simple NNTs are insufficient for costeffectiveness analyses in the case of chronic diseases and continuous treatments. If the duration of treatment is important, more complicated methods are required, e.g. survival techniques for time dependent covariates. These methods are not considered in this paper because the problem of treatment duration is independent from the type of outcome (binary or timetoevent data). If the treatment duration plays a role in the analysis, it has to be considered in addition to the effect measure used, regardless of whether the effect measure is the NNT or any other measure (risk difference, odds ratio, hazard ratio). In general it is highly subjective whether NNTs are useful or not. Therefore, we did not judge the usefulness of reported NNTs in the specific situation but considered the frequency of NNT applications in RCTs published in major medical journals in the years 2003 to 2005 and verified whether the applied calculation methods were technically appropriate in the case of timetoevent outcomes.
The error produced by using an inadequate method to calculate NNTs is unpredictable. In a number of cases, there was no substantial difference between adequately and inadequately calculated NNTs. For example, one trial with inappropriate NNT calculation presented a number needed to treat of 39 which is nearly the same as the correct result of 38.2 obtained by the appropriate method proposed by Altman & Andersen [14]. However, in another trial the published NNT of 23 is 26.4% too large (absolute difference: +4.8) compared with the correct result of 18.2. In another example the published NNT of 10 is 32% too small (absolute difference: 4.7) compared with the correct result of 14.7 (Table 3). It has been argued that clinicians should not be overly concerned about inaccuracies that may arise from estimating NNTs inadequately from naive proportions, especially when using data from large RCTs with high rates of followup [20]. We agree that in the case of equal censoring in the two groups the difference between adequately and inadequately calculated NNTs is negligible in practice. However, if the amount of censoring is quite different between the experimental and control group, relevant differences between adequately and inadequately calculated NNTs can be obtained. Moreover, confidence intervals for NNTs will be too narrow if censoring is not taken into account because the values used for the effective sample sizes are too large. This is demonstrated in Table 3 where the recalculated confidence interval covers the reported confidence interval completely. Unfortunately, there was only one study in which a confidence interval for NNT was reported and a recalculation of the confidence interval was possible. As the application of survival techniques is standard in the analysis of RCTs with varying followup times to account for censoring there is no reason to accept inaccurate point or interval estimates for NNTs due to neglecting censoring.
According to the CONSORT statement [21] confidence intervals should be reported for estimated effect measures to indicate the precision of the estimates. Due to the unusual scale of NNTs their confidence intervals are difficult to describe if the effect is not significant [22]. This may be one reason why confidence intervals for the number needed to treat were given in one third of the investigated articles only (timetoevent and binary data). Nevertheless, the methodology to calculate confidence intervals for NNTs is described and explained in the statistical as well as in the medical literature [12, 22–27], so that the unusual scale of NNTs should be no argument to disregard the CONSORT statement.
Conclusion
In summary, there is much room for improvement in the application of the number needed to treat to present results of randomised controlled trials, especially where the outcome is time to an event. To account for censoring survival time techniques have to be used to calculate the number needed to treat. The common standard to provide confidence intervals to indicate the uncertainty of estimated effect measures should also be applied to the number needed to treat. In general, it should be carefully considered whether the use of the number needed to treat is sensible in the specific context. If the number needed to treat is applied the use of correct calculation methods is required as well as the presentation of point and interval estimates.
Abbreviations
 ARR:

absolute risk reduction
 BMJ:

British Medical Journal
 CONSORT:

Consolidated Standards of Reporting Trials Statement
 JAMA:

Journal of the American Medical Association
 NEJM:

New England Journal of Medicine
 NNH:

number needed to harm
 NNS:

number needed to screen
 NNT:

number needed to treat
 NNTB:

number needed to treat for one person to benefit
 NNTH:

number needed to treat for one person to be harmed
 RCT:

randomised controlled trial.
Declarations
Acknowledgements
We thank Natalie McGauran and Ulrich Grouven for editorial and Christoph Schürmann for technical support. Furthermore we thank Thomas Kaiser for his help to identify the basic pool of RCTs, especially during search process and abstract review.
This work is supported in part by German Research Foundation (Deutsche Forschungsgemeinschaft, DFG), grant BL 443/52.
Authors’ Affiliations
References
 Laupacis A, Sackett DL, Roberts RS: An assessment of clinically useful measures of the consequences of treatment. N Engl J Med. 1988, 318: 17281733.View ArticlePubMedGoogle Scholar
 Cook RJ, Sackett DL: The number needed to treat: A clinically useful measure of treatment effect. BMJ. 1995, 310: 452454.View ArticlePubMedPubMed CentralGoogle Scholar
 Walter SD: Choice of effect measure for epidemiological data. J Clin Epidemiol. 2000, 53: 931939. 10.1016/S08954356(00)002109.View ArticlePubMedGoogle Scholar
 Hutton JL: Number needed to treat: Properties and problems. J R Stat Soc A. 2000, 163: 403415.View ArticleGoogle Scholar
 Altman DG, Deeks JJ: Comments on the paper by Hutton. J R Stat Soc A. 2000, 163: 415416.Google Scholar
 Lesaffre E, Pledger G: Comments on the paper by Hutton. J R Stat Soc A. 2000, 163: 417Google Scholar
 Kristiansen IS, GyrdHansen D, Nexøe J, Nielsen JB: Number needed to treat: Easily understood and intuitively meaningful? Theoretical considerations and a randomized trial. J Clin Epidemiol. 2002, 55: 888892. 10.1016/S08954356(02)004328.View ArticlePubMedGoogle Scholar
 Grieve AP: The number needed to treat: A useful clinical measure or a case of the Emperor's new clothes?. Pharmaceut Stat. 2003, 2: 87102. 10.1002/pst.33.View ArticleGoogle Scholar
 Halvorsen PA, Kristiansen IS: Decisions on drug therapies by numbers needed to treat: A randomized trial. Arch Intern Med. 2005, 165: 11401146. 10.1001/archinte.165.10.1140.View ArticlePubMedGoogle Scholar
 Nexøe J, Kristiansen IS, GyrdHansen D, Nielsen JB: Influence of number needed to treat, costs and outcome on preferences for a preventive drug. Fam Pract. 2005, 22: 126131. 10.1093/fampra/cmh706.View ArticlePubMedGoogle Scholar
 Halvorsen PA, Selmer R, Kristiansen IS: Different ways to describe the benefits of riskreducing treatments: A randomized trial. Ann Intern Med. 2007, 146: 848856.View ArticlePubMedGoogle Scholar
 Bender R: Number needed to treat (NNT). Encyclopedia of Biostatistics. Edited by: Armitage P, Colton T. 2005, Chichester: Wiley, 37523761.Google Scholar
 Altman DG, Schulz KF, Moher D, Egger M, Davidoff F, Elbourne DR, Gøtzsche PC, Lang T, for the CONSORT Group: The revised CONSORT statement for reporting randomized trials: Explanation and elaboration. Ann Intern Med. 2001, 134: 663694.View ArticlePubMedGoogle Scholar
 Altman DG, Andersen PK: Calculating the number needed to treat where the outcome is time to an event. BMJ. 1999, 319: 14921495.View ArticlePubMedPubMed CentralGoogle Scholar
 Lubsen J, Hoes A, Grobbee D: Implications of trial results: The potentially misleading notations of number needed to treat and average duration life gained. Lancet. 2000, 356: 17571759. 10.1016/S01406736(00)032153.View ArticlePubMedGoogle Scholar
 Mayne TJ, Whalen E, Vu A: Annualized was found better than absolute risk reduction in the calculation of number needed to treat in chronic conditions. J Clin Epidemiol. 2006, 59: 217223. 10.1016/j.jclinepi.2005.07.006.View ArticlePubMedGoogle Scholar
 Lin CT, Wu SJS, Balakrishnan N: Parameter estimation for the linear hazard rate distribution based on records and interrecord times. Commun Stat A. 2003, 32: 729748.View ArticleGoogle Scholar
 Liu GF, Wang J, Liu K, Snavely DB: Confidence intervals for an exposure adjusted incidence rate difference with applications to clinical trials. Stat Med. 2006, 25: 12751286. 10.1002/sim.2335.View ArticlePubMedGoogle Scholar
 Nuovo J, Melnikow J, Chang D: Reporting number needed to treat and absolute risk reduction in randomized controlled trials. JAMA. 2002, 287: 28132814. 10.1001/jama.287.21.2813.View ArticlePubMedGoogle Scholar
 de Lemos ML: NNT for studies with longterm followup (Letter). CMAJ. 2005, 172: 613author reply 613–615View ArticlePubMedPubMed CentralGoogle Scholar
 Moher D, Schulz KF, Altman DG, for the CONSORT Group: The CONSORT statement: Revised recommendations for improving the quality of reports of parallelgroup randomized trials. Ann Intern Med. 2001, 134: 657662.View ArticlePubMedGoogle Scholar
 Altman DG: Confidence intervals for the number needed to treat. BMJ. 1998, 317: 13091312.View ArticlePubMedPubMed CentralGoogle Scholar
 Lesaffre E, Pledger G: A note on the number needed to treat. Control Clin Trials. 1999, 20: 439447. 10.1016/S01972456(99)000185.View ArticlePubMedGoogle Scholar
 Bender R: Calculating confidence intervals for the number needed to treat. Control Clin Trials. 2001, 22: 102110. 10.1016/S01972456(00)001343.View ArticlePubMedGoogle Scholar
 Barrowman NJ: Missing the point (estimate)? Confidence intervals for the number needed to treat. CMAJ. 2002, 166: 16761677.PubMedPubMed CentralGoogle Scholar
 Hildebrandt M, Bender R, Gehrmann U, Blettner M: Calculating confidence intervals for impact numbers. BMC Med Res Methodol. 2006, 6: 3210.1186/14712288632.View ArticlePubMedPubMed CentralGoogle Scholar
 Bender R, Kuss O, Hildebrandt M, Gehrmann U: Estimating adjusted NNT measures in logistic regression analysis. Stat Med. 2007, 26: 55865595. 10.1002/sim.3061.View ArticlePubMedGoogle Scholar
 The prepublication history for this paper can be accessed here:http://www.biomedcentral.com/14712288/9/21/prepub
Prepublication history
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.