Reduction in number to treat versus number needed to treat

Background We propose a new measure of treatment effect based on the expected reduction in the number of patients to treat (RNT) which is defined as the difference of the reciprocals of clinical measures of interest between two arms. Compared with the conventional number needed to treat (NNT), RNT shows superiority with both binary and time-to-event endpoints in randomized controlled trials (RCTs). Methods Five real RCTs, two with binary endpoints and three with survival endpoints, are used to illustrate the concept of RNT and compare the performances between RNT and NNT. For survival endpoints, we propose two versions of RNT: one is based on the survival rate and the other is based on the restricted mean survival time (RMST). Hypothetical scenarios are also constructed to explore the advantages and disadvantages of RNT and NNT. Results Because there is no baseline for computation of NNT, it fails to differentiate treatment effect in the absolute scale. In contrast, RNT conveys more information than NNT due to its reversed order of differencing and inverting. For survival endpoints, two versions of RNT calculated as the difference of the reciprocals of survival rates and RMSTs are complementary to each other. The RMST-based RNT can capture the entire follow-up profile and thus is clinically more intuitive and meaningful, as it inherits the time-to-event characteristics for survival endpoints instead of using truncated binary endpoints at a specific time point. Conclusions The RNT can serve as an alternative measure for quantifying treatment effect in RCTs, which complements NNT to help patients and clinicians better understand the magnitude of treatment benefit. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01246-5.


Background
Randomized controlled trials (RCTs) are the gold standard to evaluate treatment effect of a new intervention in comparison with a control (e.g., the standard of care) [1]. However, it is often difficult to convey the findings in RCTs to patients and clinicians due to the complexity of statistical analysis and lack of interpretability of measurements of treatment effect. For example, let p E and p C denote the response rates of the experimental treatment and control, respectively. The relative risk RR = p E / p C is the ratio of two response rates, absolute risk reduction ARR = p E − p C corresponds to the difference, and relative risk reduction RRR = (p E − p C )/p C evaluates the difference in two response rates relative to a reference group. For survival endpoints, the commonly used hazard ratio (HR) is the ratio of the hazard functions for the treatment versus control groups, and ARR characterizes the difference of survival probabilities at a particular time point between two groups. The definitions of RR, ARR, RRR and HR may not be transparent to patients and clinicians. For better understanding of treatment effect, the number of patients needed to treat (NNT), defined as the reciprocal of the difference between p E and p C , i.e., NNT = 1/(p E − p C ), has been widely advocated for reporting the results of RCTs [2][3][4][5]. It can be interpreted as the expected number of patients needed to treat in order to gain one extra response or save one extra life (if mortality is the endpoint) using the treatment vs control. The NNT can be further classified as NNTB (benefit) or NNTH (harm) depending on the beneficial or harmful effects of the treatment [3,4].
A similar NNT definition has been established for the survival endpoint using ARR. Let S E (τ) and S C (τ) be the survival probabilities at time τ for the experimental and control arms respectively, and then ARR = S E (τ) -S C (τ), which can be estimated as the difference in the Kaplan-Meier (KM) survival rates at time τ [4]. As a result, NNT surv is defined as which represents the number of patients needed to treat to prevent one event of interest (e.g., death or disease progression) up to follow-up time τ. When the two survival rates are very close at some time points but different at others, NNT surv would vary dramatically over time and sometimes can take very large numbers.
Instead of using the truncated information such as survival rates at a particular time point, one can compute the mean survival (or event-free) time during a prespecified follow-up period, which is known as the restricted mean survival time (RMST) [6][7][8][9][10][11]. The RMST has been advocated broadly in medical literature as a robust quantification of the treatment effect [8][9][10]. Due to censoring, the mean survival time is not estimable, while the RMST can be estimated by the area under the KM curve up to a specific time point. Along a similar line, the RMST-based NNT may serve as an alternative to NNT surv for survival endpoints, namely NNT RMST , which inherits the advantages of the transparency and unambiguity of RMST in quantifying timeto-event data [12].
The NNT is calculated by first obtaining the difference of clinical measures of interest for two treatments, e.g., response rates, survival rates, or RMSTs, and then taking the reciprocal of the difference. However, without a baseline as the reference, NNT (similar to HR) ignores the absolute scale of the clinical measures and thus may cause ambiguity in certain cases [13][14][15]. For example, the following two scenarios cannot be distinguished by NNT: (1) p E = 0.2, p C = 0.1 and thus NNT = 1/(p E − p C ) = 10; (2) p E = 0.5 and p C = 0.4, also leading to NNT = 10. Nevertheless, the two situations are clearly different: the treatment doubles the response rate of the control in the former case while the increment is only 25% in the latter case. The NNT only depends on the difference but not the response rates themselves. Moreover, because ARR is the difference of two probabilities with a range from − 1 to 1, the range of NNT is (−∞, −1] ∪ [1, +∞) rather than the whole real line. If the two response rates are close, NNT takes a very large value and even becomes infinity if the two response rates are exactly the same. When the difference of the two response rates is insignificant, i.e., the corresponding confidence interval (CI) [ARR lower , ARR upper ] of ARR covers zero, the CI of NNT would have a strange form of [NNTB 1/ARR upper to ∞ to NNTH -1/ARR lower ] [3], which contains infinity in the middle of two numbers. Such an irregular form of CI often causes confusion for clinicians and patients.
To resolve the limitations of NNT with binary and time-to-event endpoints, we propose an alternative quantity, the reduction in the number of patients to treat (RNT), which is computed as the difference of the reciprocals of clinical measures between two arms, i.e., first taking reciprocals of response rates and then obtaining the difference of reciprocals. Unlike NNT which can be infinity if the two response rates are equal, RNT takes a value on the entire real line and its CI always has a regular form, rather than the CI of NNT as the union of two separate intervals when the response rate difference is insignificant.

Methods
For binary endpoints, we propose a new quantity, where 1/p C and 1/p E are the expected number of patients needed to treat in order to observe one response in the control and experimental groups, respectively. By definition, RNT is computed by first taking reciprocals of the response rates and then obtaining the difference. The RNT can be interpreted as the expected reduction in the number of patients to treat for the treatment compared with the control to induce one response. For the two scenarios considered earlier, RNT equals 5 for the low response case and 0.5 for the high response case, which clearly distinguishes the two situations. illustrates the difference between RNT and NNT, where the former depends on individual response rates but the latter does not. Compared with NNT, RNT has a baseline and thus delivers more information on treatment effect. To see connections between different quantities, we can rewrite RNT as Not only does RNT involve NNT, but it also includes the multiplication of response rates of both experimental and control arms. Furthermore, it can be rewritten as RRR divided by the response rate of the experimental arm. Through the normal approximation and delta method [16], the two-sided Wald-type 100(1 − α)% CI of RNT has the form of where n i is the number of patients in arm i (i = E, C) and z 1 − α/2 is the (1 − α/2)-th quantile of the standard normal distribution. When sample size is small, the Wald CIs may not be accurate, and instead the bootstrap percentile CIs or exact CIs can be used. Detailed procedures of constructing bootstrap percentile CIs [17] and exact CIs for RNT are presented in Supplementary Materials and numerical studies were conducted to examine the performances of various CIs (see Supplementary Table S1). For survival endpoints, similar to NNT surv , we can define RNT surv based on the survival rates of two arms, where τ is a pre-specified follow-up time. The RNT surv can be interpreted as the expected reduction in the number of patients to treat in the experimental arm compared with the control arm to prevent one adverse event (e.g., death or disease progression) up to time τ.
On the other hand, the difference in RMSTs represents the average gain in survival time for patients receiving the treatment in comparison with the control during the τ-period follow-up. The definition of RMST-based RNT is where RMST E (τ) and RMST C (τ) are the RMSTs up to time τ in the experimental and control arms, respectively. The RNT RMST quantifies the reduction in the number of patients to treat in the experimental arm compared with the control arm in order to obtain one survival case by time τ, which is equivalent to obtaining a total of τ event-free survival time. In Sections 3 and 4, we compare RNT RMST with an RMST-based NNT [12], which can be interpreted as the number needed to treat in the experimental arm compared with the control arm in order to obtain one extra survival case by time τ or gain a total of τ event-free survival time.
To quantify the uncertainty of RNT surv and RNT RMST , we can compute standard errors by the delta method and construct Wald CIs by normal approximation [16]. The corresponding 100(1 − α)% CI for RNT surv and RNT RMST can be calculated as where Var(.) represents the variance of the survival rate or RMST [11,18]. Similar to binary cases, the Wald CI may not be accurate with small sample size, while a percentile CI obtained from a perturbation-resampling approach [11] can be used as an alternative. Supplementary Materials contain detailed steps to construct the CIs of RNT based on survival rates and RMSTs via the perturbation-resampling method as well as simulation studies to compare their performances (see Supplementary Table S2).

Results
We demonstrate the advantages of our proposed RNT over NNT in five real trials, including two trials with binary endpoints and three trials with time-to-event endpoints. Wald CIs are used for binary data and perturbation-resampling CIs are used for survival data as suggested by the simulations in Supplementary Materials.

Example 1: KCSG-LU05-04 trial and GILT trial
In cancer research, the commonly used overall response rate (ORR) is defined as the proportion of patients whose tumours are no longer detectable (complete response) or the tumour size has significantly decreased (partial response) after treatment. For inoperable stage III non-small-cell lung cancer (NSCLC), two clinical trials [19,20] were conducted to examine the efficacy of concurrent chemotherapy alone (CRT) versus concurrent chemotherapy plus consolidation (CRT-C).
In the KCSG-LU05-04 trial [19], 420 patients were randomized with 211 in the CRT arm and 209 in the CRT-C arm. Responses to therapy were observed on 81 patients treated with CRT and 90 with CRT-C. The ORR was 38.4% for CRT and 43.1% for CRT-C, leading to NNT 21.4 (95% CI [NNTB 7.1 to ∞ to NNTH 21.2]). Thus, the average number of patients needed to treat using CRT-C compared with CRT in order to obtain one extra response was 21.4.
Flentje et al. [20] conducted a similar trial, named GILT, to compare the CRT alone versus CRT plus consolidation, with 105 patients enrolled in the CRT arm and 96 in the CRT-C arm. The ORR was 24.8 and 29.1% for CRT and CRT-C respectively, which led to NNT 22.7 (95% CI [NNTB 6.0 to ∞ to NNTH 12.7]), similar to the NNT in the previous trial.
Although the two NNTs of the aforementioned NSCL C trials were close and thus represented similar benefit of the additional consolidation therapy, there was substantial difference in the ORR of the CRT arm between the two trials (38.4% versus 24.8%). The NNT is calculated as the reciprocal of the absolute difference and thus fails to convey the information on the response rates themselves. In contrast, RNT involves a baseline when calculating the difference of the reciprocals of the response rates of two arms. The estimated RNT for the KCSG-LU05-04 trial was 0.28 (95% CI [− 0.29, 0.86]), while that for the GILT trial was 0.61 (95% CI [− 1.11, 2.33]). The two RNTs are very different, and the latter is more than double of the former. Compared with CRT, on average 0.28 fewer patient would be needed by treatment CRT-C to obtain one response for the KCSG-LU05-04 trial, and that for the GILT trial was 0.61. Moreover, the CIs of RNT have the standard form, rather than the irregular form with the infinity in the range of NNTB and NNTH under the NNT formulation.

Example 2: S0226 trial
The S0226 trial [21] was a multi-center, randomized, open-label study with patients of metastatic breast cancer to evaluate the potential benefit of adding fulvestrant to anastrozole therapy versus anastrozole alone. A total of 694 patients were enrolled with 345 assigned to anastrozole alone and 349 to fulvestrant plus anastrozole therapy. The primary endpoint was progression-free survival (PFS) and the corresponding Kaplan-Meier curves are shown in Fig. 1a where the two survival curves are nearly overlapped during the first year and then separated afterwards, but finally almost converge toward the end of the study. We reconstructed the data from the PFS curves for all eligible patients [22]. The estimates of NNT surv and RNT surv together with their CIs at different time points during the 10-year follow-up period are shown in Fig. 1b and Supplementary Table S3.
As shown in Fig. 1b (noting the difference in the yaxis for NNT and RNT), NNT surv takes extremely large values (e.g., infinity) during the first year follow-up because the two survival curves are almost indistinguishable, and continues to decrease till year 3 and then starts to increase after year 4. In contrast, the values of RNT surv remain quite stable during the entire follow-up except at year 10, where the survival rates of both arms are low and RNT surv becomes sensitive due to the direct inversion of survival rates. The values of NNT surv at years 3 and 4 are close, which fails to deliver the information that the survival rate of the anastrozole alone at year 3 was about twice of that at year 4. Such findings, however, can be revealed by the significant gap between the values of RNT surv at years 3 and 4 as shown in Fig.  1b

Example 3. Urgent endoscopy vs early endoscopy for acute upper gastrointestinal bleeding
A recent randomized clinical trial was conducted by Lau et al. [23] to evaluate clinical performance of urgent endoscopy versus early endoscopy in high-risk patients with upper gastrointestinal bleeding. A total of 516 patients were enrolled and equally randomized to the urgent and early endoscopy groups. The primary endpoint was death from any cause during the 30-day follow-up period and the overall survival (OS) curves are shown in Fig. 2a. The two OS curves cross once at day 10. When the two survival curves make a crossing, the survival probabilities of two arms are equal at the crossing point. As a consequence, the NNT estimate would be infinity for which the clinical meaning is obscure, while the RNT estimate equals 0, indicating no difference in the treatment benefit. The values of NNT surv and RNT surv calculated from the OS probabilities during the 30-day follow-up period are plotted in Fig. 2b. Compared with the irregular y-axis of NNTs (i.e., the left-hand side yaxis of Fig. 2b) ranging from NNTH to ∞ to NNTB, the commonly used axis of RNTs (i.e., the right-hand side yaxis of Fig. 2b) has the zero point at the center. When NNT surv at day 10 is ∞, the corresponding RNT surv is zero.
Urgent endoscopy performed better with a lower death rate up to day 10, while early endoscopy showed more benefit during the rest of the follow-up. At day 11, the value of RNT surv is − 0.004 (95% CI [− 0.048, 0.039]), indicating that the early endoscopy (control) arm performed slightly better on reducing all-cause deaths.
However, such an interpretation ignores the fact that the OS rate of the urgent endoscopy group was higher during the first 10 days and the result at a specific time point only includes local information rather than the global treatment effect and thus conveys misleading findings. As an alternative, RMST can be used to assess the entire profile of treatment effect over time, which can serve as the basis for the calculation of the RMSTbased NNT and RNT. Fig. 2c displays NNT RMST and RNT RMST from day 1 to day 30. The value of RNT surv depends on the survival probability at each time point, and thus RNT surv fluctuates more drastically over time. In contrast, RNT RMST represents a cumulative summary of survival information up to a specified time point, which changes more smoothly over time. The RNT RMST at day 11 was 0.008 (95% CI [− 0.020, 0.038]), indicating slight superiority for urgent endoscopy. Note that the RNTs based on survival rates and RMSTs have opposite signs, although both are statistically insignificant at the 5% significance level. As shown in Supplementary Table  S4, RNT surv at day 30 is − 0.027 (95% CI [− 0.085, 0.027]), i.e., to obtain 100 survival cases by day 30, urgent endoscopy (experimental) needs to treat on average 2.7 more patients compared with early endoscopy (control). The RNT RMST at day 30 is − 0.009 (95% CI [− 0.048, 0.031]), which means that during the 30-day follow-up, on average 0.9 fewer patient would be needed for early endoscopy to obtain 100 survival cases at day 30 (or 30 × 100 = 3000 patient-days), compared with urgent endoscopy.

Example 4. Prophylactic cranial irradiation trial
The RTOG 0214 trial was a phase 3 randomized study to determine whether prophylactic cranial irradiation (PCI) could improve survival in patients with locally advanced NSCLC compared with the observation group after effective locoregional/systemic therapy [24]. The trial enrolled 340 patients, with 163 randomized to the PCI group and 177 to the observation group. The disease-free survival (DFS) curves of the PCI and observation groups in Fig. 3a are intertwined during the first half year and then diverge and converge several times during the remaining follow-up period. As a result, the estimates of NNT surv fluctuate more dramatically during  [23]. a Kaplan-Meier estimates of overall survival curves for the urgent endoscopy and early endoscopy groups; b NNTs and RNTs calculated from the survival rates with the 95% CIs; c NNTs and RNTs calculated from RMSTs with the 95% CIs fewer patient to obtain one disease-free patient during the 10-year follow-up period. The NNT RMST exhibits a similar but more smoothed trend compared with NNT surv , which decreases in the first 3 years and then increases and finally drops again during the later followup of the study. At years 7 and 10, the corresponding estimates of NNT RMST are very close, 23.19 and 23.21, which cannot discriminate the treatment benefit. While the RNT RMST at year 10 is about 1.5 times of that at year 7, which not only conveys the absolute RMST difference but also the information on the values of RMSTs for the experimental and control arms.

Hypothetical examples
For better illustration, we further use hypothetical examples to discuss the advantages and disadvantages of RNT in comparison with NNT under binary and survival endpoints. Table 1 shows the values of NNT and RNT under various baseline response rates and response rate differences. With a fixed response rate difference, the NNT remains the same, while there is an obvious reduction in RNT as the baseline rate increases. For example, when ARR = 0.1, NNT is 10 regardless of the value of the baseline response rate; however, RNT ranges from 0.14 to 90.9 when the baseline response  [24]. a Kaplan-Meier estimates of disease-free survival curves for the prophylactic cranial irradiation (PCI) and observation groups; b NNTs and RNTs calculated from the survival rates with the 95% CIs; c NNTs and RNTs calculated from RMSTs with the 95% CIs rate decreases from 0.8 to 0.01. More importantly, NNT would be infinity when the response rate difference is zero, which is difficult to interpret in comparison with the corresponding value of zero for RNT. Due to the definition as the difference of the reciprocals of response rates, RNT is sensitive to the change of the response rate difference when the baseline response rate is low. When the response rates are high (e.g., when the baseline response rate is greater than 0.6), the value of RNT tends to be small and sometime can be less than one. Four hypothetical scenarios are constructed to compare NNT and RNT based on survival rates and RMSTs, respectively. Scenario 1 (Fig. 4a) reflects the proportional hazards case where the experimental arm is consistently better than the control arm. The decreasing trend of NNT surv and NNT RMST and increasing trend of RNT surv and RNT RMST at four time points demonstrate an increasing treatment difference over the follow-up period. Compared with NNT surv and NNT RMST , relatively larger changes can be observed for RNT surv and RNT RMST from time points 1 to 2. In Scenario 2 (Fig.  4b), the two survival curves diverge during the first half of the follow-up and then converge in the second half. The value of NNT surv is infinity at the end of the study, for which the clinical interpretation is not easy. In contrast, RNT surv takes a value of zero at time point 2, clearly indicating no treatment difference because the same number of patients is needed to treat in order to obtain one survival case at time point 2 for the two arms. However, since the survival rate at a particular time point can only reflect the local survival information, NNT surv and RNT surv fail to capture the divergence and convergence pattern of survival curves. In such cases, NNT RMST and RNT RMST at the end of the follow-up can quantify the entire profile of the two survival curves. In

Discussion
As an essential component of RCTs, interpreting the evidence of the treatment effect to practitioners plays a vital role in their decision making under the risk-benefit consideration. In binary data cases, the popularity of ARR in medical research makes NNT a primary tool for quantifying and presenting treatment effect. However, as the reciprocal of ARR, NNT fails to convey information on the absolute scale of the response rates and its irregular form of CI containing the infinity between the lower and upper bounds often causes confusion. Similar issues also arise for survival endpoints when using survival rates at a particular time point.
As an alternative, the proposed RNT reflects both the difference and absolute values of the clinical measurement of interest, and the corresponding CI always has the regular form with the center around 0 when the two clinical measurements are close or equal, leading to a more transparent presentation on the variation of the treatment difference. Moreover, when conducting metaanalysis by pooling information from multiple RCTs, the pooled NNT could be misleading and the irregular CIs would be difficult to be used in conjunction with regular CIs [25,26]. In contrast, the pooled RNT using the regular form of CI can still maintain its statistical properties and clinically meaningful interpretation.
Although the proposed RNT has attractive features on the quantification of clinical benefit, there exist several limitations. First, when the two clinical measurements are close to each other or when both take large values, RNT would have a small value. For example, if the response rates are 85 and 80% for the experimental and control arms respectively, RNT is equal to 0.074, i.e., on average 0.074 fewer patient would be needed by the experimental treatment to obtain one response compared with the control. In such cases, one can change the unit of the response from one to 100, i.e., on average 7.4 fewer patients are needed by the experimental treatment to obtain 100 responses compared with the control. In addition, similar to NNT, RNT is directly computed from the clinical quantities (e.g., response rates, survival rates and RMSTs) and thus all versions of RNTs share the limitations of NNT [13][14][15]. The values of RNT may not be comparable when the evaluated clinical endpoints are different, e.g., one cannot aggregate RNTs obtained from overall survival and progression-free survival. Moreover, RNT works for binary and time-to-event endpoints, but not for continuous endpoints. Focusing on the summary data rather than individual-level patient data, RNT evaluates the expectation for all patients in a clinical trial rather than characterizing individual distinctions.

Conclusion
Despite the limitations, RNT is a metric of great value and has advantages over the commonly used NNT. It can help clinicians and patients understand treatment benefits and their variations from a clinically clear and intuitive perspective.