 Research
 Open Access
 Published:
Power and sample size calculation for incremental net benefit in cost effectiveness analyses with applications to trials conducted by the Canadian Cancer Trials Group
BMC Medical Research Methodology volume 23, Article number: 179 (2023)
Abstract
Background
Historically, a priori power and sample size calculations have not been routinely performed costeffectiveness analyses (CEA), partly because the absence of published cost and effectiveness correlation and variance data, which are essential for power and sample size calculations. Importantly, the empirical correlation between cost and effectiveness has not been examined with respect to the estimation of valueformoney in clinical literature. Therefore, it is not well established if costeffectiveness studies embedded within randomizedcontrolledtrials (RCTs) are under or overpowered to detect changes in valueformoney. However, recently guidelines (such as those from ISPOR) and funding agencies have suggested sample size and power calculations should be considered in CEAs embedded in clinical trials.
Methods
We examined all RCTs conducted by the Canadian Cancer Trials Group with an embedded costeffectiveness analysis. Variance and correlation of effectiveness and costs were derived from originaltrial data. The incremental net benefit method was used to calculate the power of the costeffectiveness analysis, with exploration of alternative correlation and willingnesstopay values.
Results
We identified four trials for inclusion. We observed that a hypothetical scenario of correlation coefficient of zero between cost and effectiveness led to a conservative estimate of sample size. The costeffectiveness analysis was underpowered to detect changes in valueformoney in two trials, at willingnesstopay of $100,000. Based on our observations, we present six considerations for future economic evaluations, and an online program to help analysts include a priori sample size and power calculations in future clinical trials.
Conclusion
The correlation between cost and effectiveness had a potentially meaningful impact on the power and variance of valueformoney estimates in the examined costeffectiveness analyses. Therefore, the six considerations and online program, may facilitate a priori power calculations in embedded costeffectiveness analyses in future clinical trials.
Highlights
• Analysts may use the online program presented in the present study to examine the a priori power of costeffectiveness analyses.
• Analysts may potentially apply the considerations presented in this paper in the planning stage of future costeffectiveness analyses.
Background
The increasing cost of anticancer agents over the past two decades has generated discussion in the literature regarding the value of the novel anticancer therapies [1]. Specifically, concerns in literature have been presented due to the disproportionally modest survival benefits of novel cancer therapeutic agents, compared to the substantial increases in cost [2]. However, phase III anticancer trials are conventionally designed to detect improvements in efficacy and not necessarily changes in valueformoney [3, 4].
Currently, CEAs embedded in cancer trials are commonly completed without formal sample size and power calculations [5]. However, a recent ISPOR Good Research Practices Task Force report has suggested the inclusion of sample size calculations for CEAs embedded in clinical trials [5]. Additionally, research grant funding agencies may commonly request statistical sample size justifications even for secondary economic evaluation endpoints in cancer clinical trials [personal communication: Matthew Cheung, Cochair of Committee on Economic Analysis, Canadian Cancer Trials Group, Nov 11, 2022]. However, there is currently a paucity of information in published literature with respect to empirical estimates of sample size parameters [5, 6]. Therefore, analysts may not be able to examine the power of CEAs embedded in clinical trials, despite its recognised importance in literature [5]. Additionally, the likelihood a new treatment is cost effective based on Bayesian methods, may be challenging to estimate because prior variance and covariance distributions are also not well established in literature [6, 7].
Importantly, because information regarding variances of cost and covariances of cost and effectiveness are often not well established in the literature, a priori sample size or power calculations for costeffectiveness analysis endpoints typically are not done or only partly done by making some assumptions regarding cost differences between the experimental and control groups and the corresponding variances [6, 8]. Additionally, because most new cancer treatments improve survival and increase costs compared to the standard, there is reasonable theoretical justification that covariance may be nonignorable in these trials [9]. Therefore, in the absence of a priori power calculations, costeffectiveness analyses may be under or overpowered to detect changes in value [9]. Additionally, because many new cancer agents enter the market with incremental cost effectiveness ratios near willingnesstopay thresholds, costnegotiations resulting in lower drug prices may reduce the associated power of costeffectiveness analyses [10].
An understanding of the relationship between the variances of and correlation between incremental cost and incremental effectiveness and their influence on power and sample size is not currently available for cancer trials with embedded economic evaluations [11]. To our knowledge, the Canadian Cancer Trials Group (CCTG) is currently the only oncology trial group with a standing committee, the Committee on Economic Analysis, dedicated to designing costeffectiveness analyses to be embedded within randomized controlled trials with collection of cost and resource utilization prospectively. Therefore, we aimed to examine power calculations in costeffectiveness analyses embedded within RCTs conducted by the CCTG, in order to better understand and facilitate future health technology assessments.
In this paper, we demonstrate the calculation of power and sample size of costeffectiveness analyses based on the paradigm of incremental net benefit developed by Willan and Lin [11] using original trial cost and effectiveness individual patient data from phase III trials conducted by the CCTG [12,13,14,15,16,17,18,19]. The statistical model is summarized in the Methods section, followed by an application to costeffectiveness analyses conducted by CCTG with respect to correlation coefficients and willingnesstopay values in the Results section. We then discuss the practical implications of our demonstration in the Discussion section, providing guidance on the design of trialbased economic analyses and an online resource to enable sample size and power calculations.
Methods
Selection of Studies and Parameter Calculation
The present study examined all RCTs conducted by CCTG with an embedded costeffectiveness analysis. Variance and correlation of effectiveness and costs were derived from originaltrial data. The primary analysis examined the power of the costeffectiveness analysis based on the incremental net benefit method, with exploration of alternative correlation and willingnesstopay values.
Statistical Methods of Calculating the Sample Size and Power based on the Paradigm of Incremental Net Benefit
Conventionally, incremental costeffectiveness ratios (ICERs), and associated confidence intervals, are a common method of quantifying value and uncertainty respectively, in costeffectiveness analyses of anticancer agents [4, 6]. In a twoarm randomized controlled trial, let the mean effectiveness of the treatment and standard arms be \({E}_{1}\) and \({E}_{0}\) respectively. Additionally, the mean in costs in the treatment and standard arms are represented as \({C}_{1}\) and \({C}_{0}\) respectively. The ratio between the change in mean cost and the change in mean effectiveness is the ICER, defined as:
where \(\Delta C={C}_{1}{C}_{0}\) and \(\Delta E={E}_{1}{E}_{0}\) are the cost and effectiveness difference between the treatment and the control groups, respectively.
The present analysis quantifies costeffectiveness through the incremental net benefit (INB) method. This method was selected because sample size calculations based on the INB method and the willingnesstopay value (or threshold), are well established in statistical literature [4], and are one of the commonly used methods, when sample size calculations are conducted in costeffectiveness analyses [20]. In cost effectiveness analyses, the willingnesstopay value is typically defined as the amount of money that a decision maker or society is willing to pay for a 1unit improvement in efficacy [21, 22]. For a twoarm randomized controlled trial, the INB is defined as
Importantly, the ICER may be calculated as a special case of the INB. Specifically, the ICER may be calculated as the horizontal intercept of the plot of b(λ) (y axis) and λ (x axis).Therefore, when the INB is calculated as \(\lambda \cdot \left(\Delta E\right)\left(\Delta C\right)=0\), willingnesstopay value λ is equivalent to the ICER. Furthermore, the confidence limits for \(b\left(ICER\right)\) cross the horizontal axis at the Fieller limits for a specified ICER, allowing for further inferences [6, 11, 23]. Therefore, without the loss of generality, the INB is used in the present analysis; however, the practical problems encountered when implementing the ICER method are similarly applicable [6, 11]. However, because ICER is a ratio of the cost and effectiveness, statistical inferences such as the corresponding standard error may be difficult to obtain. [6, 11]. Additionally, standard errors may be unreasonable in some cases (e.g., when the change in efficacy is close to zero, resulting in a standard error that is close to infinity).
The hypothesis test of the costeffectiveness for a willingnesstopay value may be examined as:
where the alternative hypothesis may be interpreted as suggesting the treatment is demonstrated to be costeffective. Additionally, failing to reject the null hypothesis implies that the costeffectiveness of the treatment, when compared to the control, is at or below the willingnesstopay threshold, barring the lack of statistical power to reject the null hypothesis. Further, in the present study, the hypothesis test \({H}_{0}:b\left(\lambda \right)\ge 0,versus\hspace{0.17em}{H}_{1}:b\left(\lambda \right)<0\) is also examined in the CO.17 and CO.17 KRAS trials. This hypothesis test was included to examine the impact of the correlation coefficient on noncosteffective trials. Importantly, the latter hypothesis test is unrealistic in CEAs, however the trends are generalizable to the former hypothesis test with respect to the impact of the correlation coefficient.
Variance and Sample Size Calculations
For a study with sample size \(n\), where \({n}_{0}\) and \({n}_{1}\) represent the sample sizes of the control and treatments groups respectively, the variance of the INB \(b\left(\lambda \right)\) can be estimated as:
where \({\sigma }_{{E}_{j}}^{2}\) and \({\sigma }_{{C}_{j}}^{2},j=0,1\) are the variance of effectiveness and costs in the control and experimental groups, respectively. Additionally, \({\rho }_{j}\) is the correlation coefficient between effectiveness and cost in the group \(j,j=0,\hspace{0.25em}1\). Conventionally, the variance of ICER is calculated through bootstrap methods or Fieller’s theorem in some cases. A derivation of the Fieller’s confidence limits when \(\widehat{b}\left(\lambda \right)\) crosses the horizontal axis is presented in Willan (2006) [24] as well as Zethraeus and Löthgren [23]. Variance and correlation of effectiveness and costs were derived from originaltrial data using intenttotreat analyses, over the entire trial time horizon. Detailed derivations and formulas of these parameters are presented by Willan [9]. Additionally, details with respect to the parameter estimates are presented in the associated CEAs of the examined trials [16,17,18,19]. The smallest important difference in incremental net benefit is defined as \({b\left(\lambda \right)}_{\delta }=\lambda \times \left(\mathrm{\Delta E}\right)\left(\mathrm{\Delta C}\right)\). Therefore, as examined by Willan and Lin (2001), to test the onesided hypotheses at the \(\alpha\) level and \(\left(1\beta \right)\times 100\mathrm{\%}\) power, the total sample size is given by:
where the type I error, \(\alpha\) in Eq. (3), is the probability of claiming that the treatment costeffective if the null hypothesis is true, which is usually set to be \(\alpha =0.05\). The type II error, \(\beta\) in Eq. (3), is the probability of failing to reject the null hypothesis, when the true INB is equal to or less than \({b\left(\lambda \right)}_{\delta }\). Therefore, \({z}_{1\alpha }\) and \({z}_{1\beta }\) are quantiles of a standard normal distribution with respect to \(\left(1\alpha \right)\) and the power, respectively.
In general, a priori assumptions based on existing literature regarding \(\Delta E\) and variance of \(\Delta E\), are used to inform the primary endpoint of the efficacy sample size calculations of the clinical trial. Specifically, in phase III clinical trials, efficacy sample size calculations are typically based on the target hazard ratio and the assumption of an exponential survival distribution [3]. Therefore, in the present analysis, the sample size calculation is based on the assumption that survival time follows an exponential distribution. In comparison, the treatment effectiveness \(\Delta E\) and variance \({\sigma }_{\Delta E}^{2}\) were calculated analytically, based on individual patient data.
To have a better understanding on what roles the correlation coefficients will play in the sample size and power determination, we make some simplification based on the following assumptions:

1.
The study will be balanced between the control and experimental arms \({n}_{0}={n}_{1}=\frac{n}{2}\).

2.
The costs in the control and experimental arms have the same variance \({\sigma }_{{C}_{0}}^{2}={\sigma }_{{C}_{1}}^{2}={\sigma }_{C}^{2}\)

3.
The effectiveness in the control and experimental arms have the same variance \({\sigma }_{{E}_{0}}^{2}={\sigma }_{{E}_{1}}^{2}={\sigma }_{E}^{2}\)

4.
The correlation coefficients in the control and experimental arms are the same: \({\rho }_{0}={\rho }_{1}=\rho\)
These assumptions have been made to facilitate illustrative comparisons with respect to the impact of the correlation coefficient on sample size calculation and should be applied to external costeffectives analyses cautiously. Based on these assumptions, we can rewrite variance formula Eq. 2 as
Since \(b{\left(\lambda \right)}_{\delta }=\lambda \cdot \left(\Delta E\right)\left(\Delta C\right)\), let \(z={\left({z}_{1\alpha }+{z}_{1\beta }\right)}^{2},\) for example, when onesided alpha = 0.05 and power = 80%, we have z = 6.18 and \(4\times z\) = 24.72. Therefore, we can replace the quantities in the sample size Eq. 3 with the corresponding terms for cost differences and the effectiveness differences in incremental net benefit analysis and obtain the following sample size formula
where \(\pi =\frac{{n}_{e}}{{d}_{e}}\), and denotes the ratio of the total expected sample size \(\left({n}_{e}\right)\) and the expected number of events \(\left({d}_{e}\right)\). The parameter \(\pi\) will be fixed once the design for the primary effectiveness endpoint is finalized. For example, in a trial with sample size \({n}_{e}=500\) and the final analysis will be triggered when \({d}_{e}=400\) events are observed, then \(\pi =\frac{500}{400}=1.25\). Furthermore, as examined by Willan & Lin [11], the corresponding power function is given by:
where \(\Phi \left(\cdot \right)\) is defined as the cumulative distribution function for a standard normal random variable. The power curve gives the probability of rejecting the hypothesis \({H}_{0}:\hspace{0.25em}b\left(\lambda \right)\le 0\) in favour of the hypothesis \({H}_{1}:\hspace{0.25em}b\left(\lambda \right)>0\), at the level \(\alpha\), for a given smallest important difference in incremental net benefit value, \(b{\left(\lambda \right)}_{\delta }\) [11, 25]. Therefore, in the present study, the smallest important difference may be conservatively calculated as a function of the observed \(\Delta E\) and \(\Delta C\) values, because discussion in literature exists regarding the best method of defining \(b{\left(\lambda \right)}_{\delta }\) [24]. However, because the intent of the present analysis was to examine the practical implications of the correlation coefficient between effectiveness and cost, with respect to a priori sample size or power calculations for costeffectiveness analysis, the results may be generalizable to other methods of defining \(b{\left(\lambda \right)}_{\delta }\). Additionally, as a sensitivity analysis the present study also examined a frequentist method of \(b{\left(\lambda \right)}_{\delta }.\) Briefly, this frequentist method is characterized as the minimum \(b{\left(\lambda \right)}_{\delta }\) value that satisfies Eq. 6 [6]. Further information on this frequentist method of \(b{\left(\lambda \right)}_{\delta }\) is available in Lachin [26].
Additionally, the methods examined in the present study may be extended to noncensored data based on the mean parameter estimates, as examined in Willan [6, 11].
Results
Summary of included trials
In the present analysis, we included all the trials for which CCTG had completed a costeffectiveness analysis, including empirically calculated variances (of means of cost and effectiveness), correlation coefficients, as well as means of the \(\Delta C\) and \(\Delta E\). In total, we identified four trials, and one retrospective subgroup analysis from one of the four trials for inclusion. The present analysis found that the standard deviation for the cost of the experimental arm ranged from $10,000 to $35,000. The standard deviation for the effectiveness of the experimental arm was observed to range from 0.1years to 2.72years. The correlation coefficient between \(\Delta C\) and \(\Delta E\) were observed to be low to moderate in all the included trials (0.042 to 0.44) (Table 1).
Correlation coefficient analysis
The correlation coefficient analysis of the present study examined valueformoney estimates changed, as a function of originaltrial cost and effectiveness correlation. We applied the proposed variance formula for health economic to four different CCTG studies: BR.10, BR.21, CO.17 (all patients and a sub study of CO.17 KRAS wild type) and LY.12. First, we examined the impact of the correlation between the \(\Delta C\) and \(\Delta E\) on the variance of \(b\left(\lambda \right)\). The results of the correlation coefficient analysis are summarized in Table 2. Additionally, contour plots examining the impact of \(\rho\) and \(b\left(\lambda \right)\) with respect to the variance of \(b\left(\lambda \right)\) are presented in Fig. 4 of the Additional file 1: Appendix. Below we use the CO.17 trial (all patients) to demonstrate how the variance of \(b\left(\lambda \right)\) changes as a function of the correlation coefficient \(\rho\), which can vary from 1 (minimum) to 1 (maximum).
The trial CO.17 is a randomized controlled trial comparing the efficacy of cetuximab plus best supportive care (n = 287) versus best supportive care (n = 285) in patients with refractory advanced colorectal cancer. Assuming that the willingnesstopay is $100,000 per lifeyear gained, then the variance of \(b\left(\lambda \right)\) is given by
When \(\rho =0\), the variance \({\sigma }_{b\left(\lambda \right)}^{2}=8,571\). When \(\rho =0.44\), the variance \({\sigma }_{b\left(\lambda \right)}^{2}=5,477\), which is a decrease of 36.1%. When \(\rho =1\) or 1, the variance will be increased or decrease by 82.0%, respectively. Assuming that the willingnesstopay is $200,000 per lifeyear gained, then the variance of \(b\left(\lambda \right)\) is given by
Which is a linear function of the correlation coefficient \(\rho\). When \(\rho =0\), the variance is 28,778. When \(\rho =0.44\), the estimated correlation coefficient from the trial data, the variance \({\sigma }_{b\left(\lambda \right)}^{2}=21,746\), which is a decrease of 21.5%. When \(\rho =1\) or 1, the variance will be increased or decrease by 48.8%, respectively. The relative changes of the variance when \(\rho =0\) and \(\rho =observed\), at a willingnesstopay value of $100,000, across all the examined trials is presented in Table 2. In the examined trials, the correlation coefficient accounted for a reduction range of 2% to 36% of the INB variance. Further, based on the CO.17 example, we appreciate that the larger the correlation coefficient, the greater the reduction of INB variance.
The relationship between the correlation coefficient and confidence intervals of the INB estimate, across a range of willingnesstopay \((\uplambda )\) values ($0 to $250,000), are examined in Fig. 1. In all the included trials, when the willingnesstopay increased from $100,000 to $200,000, the width of the confidence intervals of the INB estimate also increased. Additionally, in all the included trials when the correlation coefficient was assumed as \(\rho =0\), the confidence intervals of the INB estimate, were slightly more conservative (i.e., wider), compared to the observed correlation coefficient value. Further, in all the included trials when the correlation coefficient was assumed as \(\rho =1\), the confidence intervals of the INB estimate underestimated the observed confidence interval (i.e., too narrow). Inferences with respect to the ICERs of the examined trials are also possible, as the ICER and the Fieller limits are represented as the intersection at the horizontal axis of the INB estimate and confidence intervals, respectively [11, 23]. Additionally, the \(\Delta E\) and \(\Delta C\) values used to derive the INB estimates based on the costeffectiveness analysis in all the examined trials, are presented in the (Additional file 1: Appendix Table 1).
Sample size and power calculations
In order to examine the relationship between the correlation coefficient and the power and sample size of the examined trials, we applied the power formula presented in Eq. 6, to the four CCTG trials (and one subgroup analysis), when \(\rho =0\), \(\rho =observed\), and \(\rho =1\). This analysis examines the probability of rejecting the null hypothesis \({H}_{0}:b\left(\lambda \right)\le 0\), versus \({H}_{1}:b\left(\lambda \right)>0\) (and \({H}_{0}:b\left(\lambda \right)\ge 0,versus\hspace{0.17em}{H}_{1}:b\left(\lambda \right)<0\) in the CO.17 and CO.17 KRAS trials), at willingnesstopay threshold of $100,000 (Fig. 2).
In all the examined trials, when the correlation coefficient was assumed to equal the correlation coefficient observed in the associated costeffectiveness analysis, the sample size needed to detect a change in the INB value decreased compared to \(\rho =0\). In trials with correlation coefficients close to zero (LY.12: \(\rho =0.042\)) the sample size required to have 80% power to reject the null hypothesis, changed minimally. However, in trials with relatively higher correlation coefficients (CO.17 (all patients):\(\rho =0.44\)) the magnitude of the sample size reduction, in order to have 80% power to reject the null hypothesis, was also relatively higher. Additionally, this trend was also observed in all trials when the \(b{\left(\lambda \right)}_{\delta }\) value was defined using a frequentist method, (Additional file 1: Appendix Fig. 1). This observation was expected because, based on formula 4, it is straightforward to appreciate the relationship between the sample size and the correlation coefficient \(\rho\). As the correlation coefficient increases, the required sample size decreases. The relationships between the correlation coefficient and confidence intervals of the INB estimate, across a range of willingnesstopay \((\uplambda )\) values ($0 to $250,000), are examined in Fig. 1.
In order to examine the relationship between the power and sample size of the included trials across willingnesstopay thresholds, we also applied the power function presented in Eq. 6, to the four CCTG trials (and one subgroup analysis) at willingnesstopay thresholds of $50,000, $100,000, and $150,000. Figure 3 examines the probability of rejecting the null hypothesis \({H}_{0}:\hspace{0.25em}b\left(\lambda \right)\le 0\), versus \({H}_{1}:b\left(\lambda \right)>0\) (and \({H}_{0}:b\left(\lambda \right)\ge 0,versus\hspace{0.17em}{H}_{1}:b\left(\lambda \right)<0\) in the CO.17 and CO.17 KRAS trials), assuming the correlation coefficient observed in the costeffectiveness analysis. The observed cost effectiveness analyses were underpowered (< 80%) to reject the null hypothesis at a willingnesstopay value of $100,000 in two of the examined trials. Further, when the INB value was close to 0 at the examined willingnesstopay threshold, as in BR.10 at a willingnesstopay value of $100,000, the sample size needed to reject the null hypothesis increased considerably, compared to the other examined trials.
Furthermore, in order to examine the relationship between power and sample size of the examined trials, as a secondary analysis we also examined twosided hypothesis testing in Additional file 1: Appendix Figs. 2 and 3. Specifically, the null and alternative hypothesis are modelled as \({H}_{0}:b\left(\lambda \right)=0\), and \({H}_{1}:b\left(\lambda \right)\ne 0\). In Fig. 2 the correlation coefficient was modelled as \(\rho =0\), \(\rho =1\), and \(\rho =\mathrm{observed}\), and the willingnesstopay threshold was examined as $100,000. In Fig. 3, the correlation coefficient was examined as \(\rho =observed\) and the willingnesstopay threshold was examined at $50,000, $100,000, and $150,000. In general, we observed similar trends to those examined in the primary analysis. Predictably, the sample sizes observed in the twosided testing analysis were larger compared to the primary analysis.
Discussion
In this paper we examine the application of the INB method for calculating sample size and power in economic evaluations, using original trial data from the CCTG. In general, the present analysis reported that the correlation coefficient between \(\Delta E\) and \(\Delta C\), had a potentially meaningful impact on the power and sample size calculations in the examined economic analyses. Specifically, when the correlation coefficient increased from \(\rho =0\) to \(\rho =observed\), the variance of the INB decreased. Additionally, when the correlation coefficient increased from \(\rho =0\) to \(\rho =observed\) the sample size needed to detect the smallest important differences in value also decreased. The present analysis examined the INB; however, by varying the willingnesstopay value, inferences with respect to the ICER are also possible.
Because of the historical absence of a priori costeffectiveness analysis power calculations in pivotal cancer trials [5, 11], we have also developed an easytouse online program (http://statapps.tk/icer_samplesize), for analysts to calculate power and sample size at the design stage of costeffectiveness analyses. Additionally, as an illustrative example of the program, possible input parameters for the BR.21 trial is presented in Fig. 5 of the Additional file 1: Appendix. Importantly, the application of our proposed program may be informed by the empirical observations presented in the current study. As examined by Willan (2001), the proposed methods may be used to identify a subset of the trial population for costeffectiveness analysis, in contrast to historical methods, which required larger sample sizes compared to the corresponding effectiveness analysis, to detect changes in costeffectiveness [4, 11]. This program will also determine the standard deviation of the effectiveness outcome based on the design of the original clinical trial, including type I error rate, power, survival probability of the control arm at a given time point and the hazard ratio.
In the present analysis, we observed a low to modest correlation between \(\Delta E\) and \(\Delta C\) in all the examined trials. Based on the sample size analysis, assuming the correlation coefficient \(\rho =0\) may be a conservative estimate to ensure sufficient power of cost effectiveness analyses. Additionally, when sufficient evidence of indicationspecific correlation coefficients exists in literature, power calculations in costeffectiveness analyses may consider correlation coefficients, in order to design more efficient economic evaluations when survival and costs are positively correlated. Further, as it is uncommon for cost and survival to be negatively correlated, negative correlation coefficients may only be applicable in unique clinical scenarios where a novel intervention would substantially reduce the utilization of a downstream expensive intervention while increasing survival. Assuming a negative correlation coefficient without sufficient biological justification will lead to an overly conservative large sample size, just as assuming a positive correlation coefficient without sufficient biological justification will lead to an insufficiently small sample size.
In the BR.21 trial at a willingnesstopay threshold of $100,000, the sample size needed to reject the null hypothesis was considerably larger, compared to the sample size of the actual costeffectiveness analysis. The interpretation of failing to reject the null hypothesis is that the costeffectiveness of the treatment, when compared to the control, is not different from the willingnesstopay threshold. Therefore, we cannot make a conclusion with respect to if a therapy is costeffective at the examined willingnesstopay threshold. Additionally, this sample size ballooning occurred when the INB estimate was near zero at the examined willingnesstopay threshold. This observation may be relevant because novel firstinclass anticancer agents with no a priori pricing information conventionally enter the market with their prices based on the willingnesstopay threshold [10]. In these scenarios, costeffectiveness analysis may be under powered to reject the null hypothesis. Therefore, in the absence of a priori sample size or power calculations, costeffectiveness analyses may not be able to identify if novel treatments are more or less costeffective compared to the control treatments.
In two of the examined trials (and one subgroup population), the sample size required to reject the null hypothesis, was smaller compared to the corresponding efficacy analysis, at \(\rho =observed\) and a willingnesstopay threshold of $100,000 (i.e. the costeffectiveness analyses were overpowered). In practice, it may not be feasible for analysts to embed costeffectiveness analyses within clinical trials that require sample sizes larger than the corresponding efficacy analyses. However, a priori power calculations in costeffectiveness analyses embedded within pivotal clinical trials may prioritize the development of costreduction therapies, and agents that preserve durable longterm response and survival. Additionally, when economic evaluations are overpowered, a priori power calculations may facilitate the identification of a population subset for subgroup costeffectiveness analysis, in order to minimize wasting resources. Further, within a net benefit regression framework, power may also be derived based on a ttest or bootstrapping methods [27, 28]. The net benefit regression framework may also benefit similarly from the observations and proposed considerations presented in the current study.
Importantly, because the empirical parameter values observed in the present study could not be systematically compared to external studies, the validity of these estimates based on trial characteristics is not well established (e.g., the impact of sample size, dropout, or type of outcomes). This limitation highlights the need for future CEAs to report detailed cost and efficacy variance and covariance data, in order to iteratively refine a range of possible values and formal guidelines for CEA analysts. Additionally, changes in price during costnegotiations may result in changes to the associated sample size and power estimates. Therefore, analysts may consider examining a range of possible parameter estimates.
The proposed a priori costeffectiveness power and sample size formula utilizes the predicted variance of the cost, effectiveness, and the correlation between cost and effectiveness as well as the expected event rate. The present study addresses the gap in literature with respect to original trial data of cost and effectiveness variances, as well as the correlation between cost and effectiveness.
Considerations
Based on the empirical exploration of originaltrial data, we present six potential considerations for future individual patientbased economic evaluations embedded in clinical trials:

1.
At the design stage of clinical trials, embedded CEAs may calculate \(\Delta E\) and standard error of \(\Delta E\) based on the efficacy assumptions for the primary efficacy endpoint statistical design.

2.
At the design stage of clinical trials, embedded health care perspective CEAs may calculate \(\Delta C\) based on the expected difference in the duration and costs of the experimental drug/regimen versus the control drug/regimen. If the price of the drug is not yet known at the time of the design, reviewers could identify a marketed drug from a comparable class (e.g., “metoo” drugs may base costs on the associated firstinclass drug) or a range of possible estimates, before adjusting for cost addons or offsets.

3.
At the design stage of future comparable cancer CEAs embedded in clinical trials, standard error of the control and experimental costs could be based on the range of values observed in the trials from CCTG as reported in the present study ($500 to $4,000).

4.
At the design stage of future comparable CEAs embedded in clinical trials, correlation coefficients between \(\Delta C\) and \(\Delta E\), could be based on the range of values we observed in the present study (\(\rho =0\) to 0.5), where \(\rho =0\) will lead to a conservative estimate of sample size when the true \(\rho >0\). These estimates may be examined as a range of possible values and should be iteratively refined as additional estimates become available in literature.

5.
When the sample size of an examined trial is defined based on the efficacy analysis, the corresponding a priori costeffectiveness analysis power calculations and potential trial population subset may be completed using the INB approach such as using our online calculator application.

6.
Future costeffectiveness analysis based on individual patient data should consistently report their observed variances and correlation coefficients of \(\Delta E\) and \(\Delta C\) in order to provide additional information to facilitate future power calculations in future studies in related cancer settings.
Conclusion
Based on our empirical observations of originaltrial data, we present six potential considerations for future economic evaluations of clinical trials. Additionally, the online program presented in the paper may facilitate a priori calculations of power and sample size in future costeffectiveness analyses.
Availability of data and materials
The data that support the findings of this study are available from CCTG but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available upon reasonable request to the corresponding author (kelvin.chan@sunnybrook.ca) and with permission of CCTG as outlined in CTGPOL0043 Data Sharing and Access Policy.
References
Cheung M, Chan KKW. Measuring value and benefit—a matter of perspective. The Lancet Oncology. 2017;18:839–40.
Saluja R, et al. Examining trends in cost and clinical benefit of novel anticancer drugs over time. J Oncol Pract. 2018;14(5):e280–94.
Lachin JM. Introduction to sample size determination and power analysis for clinical trials. Control Clin Trials. 1981;2(2):93–113.
Willan AR. Analysis, sample size, and power for estimating incremental net health benefit from clinical trial data. Control Clin Trials. 2001;22(3):228–37.
Ramsey SD, Willke RJ, Glick H, Reed SD, Augustovski F, Jonsson B, Briggs A, Sullivan SD. Costeffectiveness analysis alongside clinical trials II—an ISPOR Good Research Practices Task Force report. Value in Health. 2015;18(2):161–72.
Willan AR. Sample size determination for costeffectiveness trials. Pharmacoeconomics. 2011;11:933–49.
O’Hagan A, Stevens JW. Bayesian assessment of sample size for clinical trials of costeffectiveness. Med Decis Making. 2001;21(3):219–30.
Cheung MC, et al. Minimization of resource utilization data collected within costeffectiveness analyses conducted alongside Canadian cancer trials group phase III trials. Clin Trials. 2021;18(4):500–4.
Willan AR. Incremental net benefit in the analysis of economic data from clinical trials, with application to the CADEThp trial. Eur J Gastroenterol Hepatol. 2004;16(6):543–9.
Briggs A. Statistical methods for costeffectiveness research: a guide to current issues and future developments. London: OHE; 2003.
Willan AR, Lin D. Incremental net benefit in randomized clinical trials. Stat Med. 2001;20(11):1563–74.
Winton T, et al. Vinorelbine plus cisplatin vs. Observation in resected non–smallcell lung cancer. N Engl J Med. 2005;352(25):2589–97.
Shepherd FA, et al. Erlotinib in previously treated non–smallcell lung cancer. N Engl J Med. 2005;353(2):123–32.
Jonker DJ, et al. Cetuximab for the treatment of colorectal cancer. N Engl J Med. 2007;357(20):2040–8.
Crump M, et al. Randomized comparison of gemcitabine, dexamethasone, and cisplatin versus dexamethasone, cytarabine, and cisplatin chemotherapy before autologous stemcell transplantation for relapsed and refractory aggressive lymphomas: NCICCTG LY. 12. J Clin Oncol. 2014;32(31):3490–6.
Ng R, Hasan B, Mittmann N, Florescu M, Shepherd FA, Ding K, Butts CA, Cormier Y, Darling G, Goss GD, Inculet R. Economic Analysis of NCIC CTG JBR. 10: A Randomized Trial of Adjuvant Vinorelbine Plus Cisplatin Compared With Observation in Early Stage Non–SmallCell Lung Cancer—A Report of the Working Group on Economic Analysis, and the Lung Disease Site Group, National Cancer Institute of Canada Clinical Trials Group. J clin oncol. 2007;25(16):2256–61.
Bradbury PA, Tu D, Seymour L, Isogai PK, Zhu L, Ng R, Mittmann N, Tsao MS, Evans WK, Shepherd FA, Leighl NB. Economic analysis: randomized placebocontrolled clinical trial of erlotinib in advanced non–small cell lung cancer. J Nat Cancer Inst. 2010;102(5):298–306.
Cheung MC, Hay AE, Crump M, Imrie KR, Song Y, Hassan S, Risebrough N, Sussman J, Couban S, MacDonald D, Kukreti V. Gemcitabine/dexamethasone/cisplatin vs cytarabine/dexamethasone/cisplatin for relapsed or refractory aggressivehistology lymphoma: costutility analysis of NCIC CTG LY. 12. JNCI: Journal of the National Cancer Institute. 1;107(7). 2015.
Mittmann N, Au HJ, Tu D, O’Callaghan CJ, Isogai PK, Karapetis CS, Zalcberg JR, Evans WK, Moore MJ, Siddiqui J, Findlay B. Prospective costeffectiveness analysis of cetuximab in metastatic colorectal cancer: evaluation of National Cancer Institute of Canada Clinical Trials Group CO. 17 trial. J Nat Cancer Inst. 2009;101(17):1182–92.
Glick HA. Sample size and power for costeffectiveness analysis (part 1). Pharmacoeconomics. 2011;29:189–98.
Briggs AH, Gray AM. Power and sample size calculations for stochastic costeffectiveness analysis. Med Decis Making. 1998;18(2_Suppl):S81–92.
Lee KM, McCarron CE, Bryan S, Coyle D, Krahn M, McCabe C. Guidelines for the Economic Evaluation of Health Technologies: Canada—4th Edition. Ottawa: CADTH; 2019.
Zethraeus N, Löthgren M. On the Equivalence of the Net Benefit and the Fieller's Methods for Statistical Inference in CostEffectiveness Analysis. No 379, Working Paper Series in Economics and Finance from Stockholm School of Economics. 2000.
Willan AR, Briggs AH. Statistical analysis of costeffectiveness data. Chichester: Wiley; 2006.
Laska EM, Meisner M, Siegel C. Power and sample size in costeffectiveness analysis. Med Decis Making. 1999;19(3):339–43.
Lachin JM. Introduction to sample size determination and power analysis for clinical trials. Controlled clinical trials. 1981;2(2):93–113.
Hoch JS, Rockx MA, Krahn AD. Using the net benefit regression framework to construct costeffectiveness acceptability curves: an example using data from a trial of external loop recorders versus Holter monitoring for ambulatory monitoring of" community acquired" syncope. BMC Health Serv Res. 2006;6(1):1–8.
Hoch JS, Hay A, Isaranuwatchai W, Thavorn K, Leighl NB, Tu D, Trenaman L, Dewa CS, O’Callaghan C, Pater J, Jonker D. Advantages of the net benefit regression framework for trialbased economic evaluations of cancer treatments: an example from the Canadian Cancer Trials Group CO. 17 trial. BMC cancer. 2019;19(1):1–9.
Acknowledgements
K.K.W.C. and M.C.C. contributed to the manuscript equally. We affirm that all individuals who contributed significantly to this work are listed as authors.
Funding
The Canadian Centre for Applied Research in Cancer Control (ARCC) is funded by the Canadian Cancer Society Research Institute, grant 2015–703549. The funding agreement ensured the authors’ independence in designing the study, interpreting the data, writing, and publishing the report.
Author information
Authors and Affiliations
Contributions
Substantial contributions to the conception: MCC, KKWC. Substantial contributions to the design: BEC, MCC, KKWC. Substantial contributions to the acquisition and analysis: BEC, LE, AH, MCC, KKWC. Substantial contributions to the interpretation of data: LE, BEC, AEH, MCC, KKWC. Substantial contributions to the drafting and revising: LE, BEC, AEH, MCC, KKWC. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Because the present study utilized exclusively secondary nonidentifiable information, ethical approval and informed consent was waived by Sunnybrook Research Institute Research Ethics Board. The study was carried out in accordance with the ethical guidelines of Sunnybrook Research Institute Research Ethics Board.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Everest, L., Chen, B.E., Hay, A.E. et al. Power and sample size calculation for incremental net benefit in cost effectiveness analyses with applications to trials conducted by the Canadian Cancer Trials Group. BMC Med Res Methodol 23, 179 (2023). https://doi.org/10.1186/s1287402301956y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1287402301956y