 Research article
 Open access
 Published:
Calculating confidence intervals for impact numbers
BMC Medical Research Methodology volumeÂ 6, ArticleÂ number:Â 32 (2006)
Abstract
Background
Standard effect measures such as risk difference and attributable risk are frequently used in epidemiological studies and public health research to describe the effect of exposures. Recently, socalled impact numbers have been proposed, which express the population impact of exposures in form of specific person or case numbers. To describe estimation uncertainty, it is necessary to calculate confidence intervals for these new effect measures. In this paper, we present methods to calculate confidence intervals for the new impact numbers in the situation of cohort studies.
Methods
Beside the exposure impact number (EIN), which is equivalent to the wellknown number needed to treat (NNT), two other impact numbers are considered: the case impact number (CIN) and the exposed cases impact number (ECIN), which describe the number of cases (CIN) and the number of exposed cases (ECIN) with an outcome among whom one case is attributable to the exposure. The CIN and ECIN represent reciprocals of the population attributable risk (PAR) and the attributable fraction among the exposed (AF_{e}), respectively. Thus, confidence intervals for these impact numbers can be calculated by inverting and exchanging the confidence limits of the PAR and AF_{e}.
Examples
We considered a British and a Japanese cohort study that investigated the association between smoking and death from coronary heart disease (CHD) and between smoking and stroke, respectively. We used the reported death and disease rates and calculated impact numbers with corresponding 95% confidence intervals. In the British study, the CIN was 6.46, i.e. on average, of any 6 to 7 persons who died of CHD, one case was attributable to smoking with corresponding 95% confidence interval of [3.84, 20.36]. For the exposed cases, the results of ECIN = 2.64 with 95% confidence interval [1.76, 5.29] were obtained. In the Japanese study, the CIN was 6.67, i.e. on average, of the 6 to 7 persons who had a stroke, one case was attributable to smoking with corresponding 95% confidence interval of [3.80, 27.27]. For the exposed cases, the results of ECIN = 4.89 with 95% confidence interval of [2.86, 16.67] were obtained.
Conclusion
The consideration of impact numbers in epidemiological analyses provides additional information and helps the interpretation of study results, e.g. in public health research. In practical applications, it is necessary to describe estimation uncertainty. We have shown that the calculation of confidence intervals for the new impact numbers is possible by means of known methods for attributable risk measures. Therefore, estimated impact numbers should always be complemented by appropriate confidence intervals.
Background
Epidemiological effect measures, such as risk differences, risk ratios, or attributable risks, are useful tools for presenting the results of epidemiological studies. Since the attributable risk can account for both the strength of the association between exposure to a risk factor and the underlying disease of interest and the prevalence of the risk factor, it is probably the most commonly used epidemiological measure for public health administrators to locate important risk factors [1]. The population attributable risk (PAR) of disease proposed by Levin [2] is a specific attributable risk, which describes the proportion of cases that is preventable in a population if this particular risk factor is completely eliminated [3]. If we consider persons with an exposure to a risk factor and the presence of a disease, the attributable fraction among the exposed (AF_{e}) defines the proportion of exposed cases that are attributable to this risk factor [4].
In addition to these widely used effect measures, Heller et al. [4] proposed new effect measures, socalled impact numbers. In this paper, we consider three of these numbers, namely, the exposure impact number (EIN), the case impact number (CIN), and the exposed cases impact number (ECIN). The EIN is equivalent to the number needed to treat (NNT) used in clinical trials as well as to the number needed to be exposed (NNE) previously proposed for use in epidemiological studies [5, 6]. The NNT is the average number of patients needed to be treated to prevent an adverse outcome in one additional patient compared with a control or standard treatment group [5]. The EIN or NNE defines the average number of persons needed to be exposed to the risk factor for one additional case of disease or death compared with the unexposed persons [6]. The EIN (NNE, NNT) represents the reciprocal of the difference between the risks of exposed and unexposed persons. Thus, the EIN describes the average number of exposed persons among whom one case is attributable to the risk factor [4]. The CIN is the reciprocal of the PAR. Thus, the CIN defines the average number of persons with the outcome among whom one case is attributable to the risk factor [4]. The ECIN is the reciprocal of the AF_{e} and can therefore be described as the average number of exposed cases among whom one case is attributable to the risk factor [4]. In summary, these three impact measures relate the impact of an exposure to all those exposed (EIN), all persons with the outcome (CIN), and all those who are both exposed and have the outcome (ECIN) in a population [4]. In practical applications, it is always necessary to describe the uncertainty of estimated parameters. For the EIN, methods already developed for the NNT can be used [5]. However, no methods to calculate confidence intervals for the new effect measures CIN and ECIN have been proposed so far. In this paper, we present simple methods to calculate the corresponding confidence intervals based on known methods for interval estimation of standard epidemiological effect measures.
Methods
Probabilities
In the simplest case, data from a cohort study can be presented by means of a 2 Ã— 2 table that relates the two binary variables "exposure" and "outcome" (disease or death). The theoretical table containing the true probabilities is shown in Table 1 (assuming a fixed followup time, no persons lost to followup, and no censoring).
Let 0 < Ï€_{ij} < 1 denote the cell probability for the four combinations of the two categories for disease and exposure with the maximum likelihood estimator of Ï€_{ij}
{\widehat{\mathrm{\xcf\u20ac}}}_{ij}=\frac{{n}_{ij}}{N},\phantom{\rule{0.1em}{0ex}}\left(1.1\right)
where n_{ij} is the random frequency falling into the cell (i, j), Ï€_{iâ€¢} = Ï€_{i1} + Ï€_{i0}, Ï€_{â€¢j} = Ï€_{0j} + Ï€_{1j}, N is the total number of subjects (N = N_{1} + N_{0}), and N_{1} and N_{0} are the numbers of exposed (N_{1}) and unexposed (N_{0}) persons in the cohort. Then we define the following probabilities [3]:
P(D) = Ï€ = Ï€_{â€¢1} = Ï€_{01} + Ï€_{11} Â Â Â (1.2)
P\left(DE\right)={\mathrm{\xcf\u20ac}}_{1}=\frac{{\mathrm{\xcf\u20ac}}_{11}}{{\mathrm{\xcf\u20ac}}_{1\xc2\xb7}}=\frac{{\mathrm{\xcf\u20ac}}_{11}}{{\mathrm{\xcf\u20ac}}_{10}+{\mathrm{\xcf\u20ac}}_{11}}\phantom{\rule{0.1em}{0ex}}\left(1.3\right)
P\left(D\stackrel{\xc2\xaf}{E}\right)={\mathrm{\xcf\u20ac}}_{0}=\frac{{\mathrm{\xcf\u20ac}}_{01}}{{\mathrm{\xcf\u20ac}}_{0\xc2\xb7}}=\frac{{\mathrm{\xcf\u20ac}}_{01}}{{\mathrm{\xcf\u20ac}}_{00}+{\mathrm{\xcf\u20ac}}_{01}}.\phantom{\rule{0.1em}{0ex}}\left(1.4\right)
The estimators for the different probabilities are given by
\widehat{\mathrm{\xcf\u20ac}}={\widehat{\mathrm{\xcf\u20ac}}}_{\xc2\xb71}={\widehat{\mathrm{\xcf\u20ac}}}_{01}+{\widehat{\mathrm{\xcf\u20ac}}}_{11},\phantom{\rule{0.1em}{0ex}}\left(1.5\right)
{\widehat{\mathrm{\xcf\u20ac}}}_{1}=\frac{{\widehat{\mathrm{\xcf\u20ac}}}_{11}}{{\widehat{\mathrm{\xcf\u20ac}}}_{1\xc2\xb7}}=\frac{{\widehat{\mathrm{\xcf\u20ac}}}_{11}}{{\widehat{\mathrm{\xcf\u20ac}}}_{10}+{\widehat{\mathrm{\xcf\u20ac}}}_{11}},\phantom{\rule{0.1em}{0ex}}\left(1.6\right)
{\widehat{\mathrm{\xcf\u20ac}}}_{0}=\frac{{\widehat{\mathrm{\xcf\u20ac}}}_{01}}{{\widehat{\mathrm{\xcf\u20ac}}}_{0\xc2\xb7}}=\frac{{\widehat{\mathrm{\xcf\u20ac}}}_{01}}{{\widehat{\mathrm{\xcf\u20ac}}}_{00}+{\widehat{\mathrm{\xcf\u20ac}}}_{01}},\phantom{\rule{0.1em}{0ex}}\left(1.7\right)
where \widehat{\mathrm{\xcf\u20ac}} is the estimator for the probability (or risk) of a disease, \widehat{\mathrm{\xcf\u20ac}} _{1} is the estimator for the probability (or risk) of a disease for an exposed person, and \widehat{\mathrm{\xcf\u20ac}} _{0} is the estimator for the probability (or risk) of a disease for an unexposed person.
Standard epidemiological effect measures
The risk difference (RD) can be positive or negative and ranges between 1 and 1. Here, we consider the situation that the risk for a disease in the exposed group is higher than in the unexposed group. In this case, we determine the absolute risk increase (ARI = Ï€_{1}  Ï€_{0}). In situations where the exposure has a protective effect, the absolute risk increase can be replaced by the absolute risk reduction (ARR = Ï€_{0}  Ï€_{1}) to have a positive risk difference in all calculations. Nevertheless, a negative risk reduction is equivalent to a positive risk increase.
For point and interval estimation of the population attributable risk (PAR) and the attributable fraction among the exposed (AF_{e}), it is helpful to consider two other commonly used relative effect measures, namely the risk ratio or relative risk (RR) and the relative risk reduction (RRR). The RR is the ratio of the probabilities of developing the disease of interest between the exposed and unexposed persons, i.e.
RR=\frac{{\mathrm{\xcf\u20ac}}_{1}}{{\mathrm{\xcf\u20ac}}_{0}}.\phantom{\rule{0.1em}{0ex}}\left(2.1\right)
The RRR is given by
RRR=1\xe2\u02c6\u2019RR=1\xe2\u02c6\u2019\frac{{\mathrm{\xcf\u20ac}}_{1}}{{\mathrm{\xcf\u20ac}}_{0}}=\frac{{\mathrm{\xcf\u20ac}}_{0}\xe2\u02c6\u2019{\mathrm{\xcf\u20ac}}_{1}}{{\mathrm{\xcf\u20ac}}_{0}}.\phantom{\rule{0.1em}{0ex}}\left(2.2\right)
The PAR is given by
PAR=\frac{\mathrm{\xcf\u20ac}\xe2\u02c6\u2019{\mathrm{\xcf\u20ac}}_{0}}{\mathrm{\xcf\u20ac}}\phantom{\rule{0.1em}{0ex}}\left(2.3\right)
and can equivalently be expressed as function of the RR [3, 7]
PAR=\frac{{\mathrm{\xcf\u20ac}}_{1\xc2\xb7}\left(RR\xe2\u02c6\u20191\right)}{{\mathrm{\xcf\u20ac}}_{1\xc2\xb7}\left(RR\xe2\u02c6\u20191\right)+1}.\phantom{\rule{0.1em}{0ex}}\left(2.4\right)
The AF_{e} is given by
A{F}_{e}=\frac{{\mathrm{\xcf\u20ac}}_{1}\xe2\u02c6\u2019{\mathrm{\xcf\u20ac}}_{0}}{{\mathrm{\xcf\u20ac}}_{1}}\phantom{\rule{0.1em}{0ex}}\left(2.5\right)
and can equivalently be expressed as function of the RR by
A{F}_{e}=1\xe2\u02c6\u2019\frac{1}{RR}.\phantom{\rule{0.1em}{0ex}}\left(2.6\right)
The domain of both the PAR and the AF_{e} is given by the interval ]âˆž, 1[. If the exposure is protective, PAR and AF_{e} are negative. However, in this case both effect measures are not meaningful and alternative effect measures such as the preventable fraction are applied in practice [7]. Here, we consider the case of harmful exposures where the application of the effect measures ARI, PAR and AF_{e} are meaningful. More details are given in the discussion.
Impact numbers
The impact numbers are defined by [4]
EIN=\frac{1}{ARI}=\frac{1}{{\mathrm{\xcf\u20ac}}_{1}\xe2\u02c6\u2019{\mathrm{\xcf\u20ac}}_{0}},\phantom{\rule{0.1em}{0ex}}\left(3.1\right)
CIN=\frac{{\mathrm{\xcf\u20ac}}_{1\xc2\xb7}\left(RR\xe2\u02c6\u20191\right)+1}{{\mathrm{\xcf\u20ac}}_{1\xc2\xb7}\left(RR\xe2\u02c6\u20191\right)},and\phantom{\rule{0.1em}{0ex}}\left(3.2\right)
\text{ECIN}=\frac{{\mathrm{\xcf\u20ac}}_{1}}{{\mathrm{\xcf\u20ac}}_{1}\xe2\u02c6\u2019{\mathrm{\xcf\u20ac}}_{0}}=\frac{\text{RR}}{\text{RR}\xe2\u02c6\u20191}=\frac{1}{1\xe2\u02c6\u2019\frac{1}{\text{RR}}}.\phantom{\rule{0.1em}{0ex}}\left(3.3\right)
It can be seen that EIN, CIN, and ECIN are the reciprocals of ARI, PAR, and AF_{e} (named aetiological fraction in [4]), respectively. These three impact numbers relate the impact of an exposure to all those exposed (EIN), all persons with the outcome (CIN), and all those who are both exposed and have the outcome (ECIN) in a population [4].
Calculating confidence intervals
In the following, we demonstrate that point and interval estimation of impact numbers can be performed if point estimators with corresponding confidence limits for RD, PAR, and AF_{e} are available. We consider the situation of prospective cohort studies with crosssectional sampling and fixed followup time to explain the methods. However, the basic principle is applicable also to other designs such as casecontrol studies so long as methods for point and interval estimation of RD, PAR, and AF_{e} are available.
Risk difference
We use the formulas given by Lui [3] for calculation of the 100(1Î±)% confidence intervals for the ARI based on the standard Wald method [8, 9]. Let Î” = Ï€_{1} Ï€_{0} be the ARI with the unbiased point estimator \widehat{\mathrm{\xce\u201d}}={\widehat{\mathrm{\xcf\u20ac}}}_{1}\xe2\u02c6\u2019{\widehat{\mathrm{\xcf\u20ac}}}_{0}. Thus, the 100(1Î±)% confidence interval for ARI is given by
\left[\mathrm{max}\left\{\widehat{\mathrm{\xce\u201d}}\xe2\u02c6\u2019{z}_{1\xe2\u02c6\u2019\frac{\mathrm{\xce\pm}}{2}}\sqrt{\widehat{VAR}\left(\widehat{\mathrm{\xce\u201d}}\right)},\xe2\u02c6\u20191\right\},\mathrm{min}\left\{\widehat{\mathrm{\xce\u201d}}+{z}_{1\xe2\u02c6\u2019\frac{\mathrm{\xce\pm}}{2}}\sqrt{\widehat{VAR}\left(\widehat{\mathrm{\xce\u201d}}\right)},1\right\}\right]\phantom{\rule{0.1em}{0ex}}\left(4.1\right)
with the variance estimator
\widehat{\text{VAR}}\left(\widehat{\mathrm{\xce\u201d}}\right)=\frac{{\widehat{\mathrm{\xcf\u20ac}}}_{1}\left(1\xe2\u02c6\u2019{\widehat{\mathrm{\xcf\u20ac}}}_{1}\right)}{{\text{N}}_{1}}+\frac{{\widehat{\mathrm{\xcf\u20ac}}}_{0}\left(1\xe2\u02c6\u2019{\widehat{\mathrm{\xcf\u20ac}}}_{0}\right)}{{\text{N}}_{0}}.\phantom{\rule{0.1em}{0ex}}\left(4.2\right)
For large sample sizes and risks not close to 0 or 1, the usual Wald method can be used to calculate confidence intervals for risk differences. However, for small sample sizes, other methods such as the Wilson score method [9â€“11] should be applied [5].
Population attributable risk
To calculate the 100(1Î±)% confidence interval for the PAR we use the formulas given by Lui [3] which are based upon the delta method [12].
Let Î˜ be defined by
\mathrm{\xce\u02dc}=\frac{{\mathrm{\xcf\u20ac}}_{01}}{{\mathrm{\xcf\u20ac}}_{0\xc2\xb7}{\mathrm{\xcf\u20ac}}_{\xc2\xb71}},\phantom{\rule{0.1em}{0ex}}\left(4.3\right)
then PAR can be described by
\text{PAR}=\frac{\mathrm{\xcf\u20ac}\xe2\u02c6\u2019{\mathrm{\xcf\u20ac}}_{0}}{\mathrm{\xcf\u20ac}}=1\xe2\u02c6\u2019\mathrm{\xce\u02dc}.\phantom{\rule{0.1em}{0ex}}\left(4.4\right)
The maximum likelihood estimator of Î˜ is
\widehat{\mathrm{\xce\u02dc}}=\frac{{\widehat{\mathrm{\xcf\u20ac}}}_{01}}{{\widehat{\mathrm{\xcf\u20ac}}}_{0\xc2\xb7}{\widehat{\mathrm{\xcf\u20ac}}}_{\xc2\xb71}}.\phantom{\rule{0.1em}{0ex}}\left(4.5\right)
Using the delta method, the asymptotic variance estimator of \widehat{\mathrm{\xce\u02dc}} is
\widehat{\text{VAR}}\left(\widehat{\mathrm{\xce\u02dc}}\right)={\widehat{\mathrm{\xce\u02dc}}}^{2}\widehat{\text{VAR}}\left(\mathrm{log}\left(\widehat{\mathrm{\xce\u02dc}}\right)\right)\phantom{\rule{0.1em}{0ex}}\left(4.6\right)
with
\widehat{\text{VAR}}\left(\mathrm{log}\left(\widehat{\mathrm{\xce\u02dc}}\right)\right)=\frac{1\xe2\u02c6\u2019{\widehat{\mathrm{\xcf\u20ac}}}_{01}}{\text{N}{\widehat{\mathrm{\xcf\u20ac}}}_{01}}\xe2\u02c6\u2019\frac{{\widehat{\mathrm{\xcf\u20ac}}}_{0\xc2\xb7}+{\widehat{\mathrm{\xcf\u20ac}}}_{\xc2\xb71}\xe2\u02c6\u20192{\widehat{\mathrm{\xcf\u20ac}}}_{01}}{\text{N}{\widehat{\mathrm{\xcf\u20ac}}}_{0\xc2\xb7}{\widehat{\mathrm{\xcf\u20ac}}}_{\xc2\xb71}},\phantom{\rule{0.1em}{0ex}}\left(4.7\right)
where N is the number of subjects.
Thus, an asymptotic 100(1Î±)% confidence interval for the PAR directly based on \widehat{\mathrm{\xce\u02dc}} is given through the following formula:
\left[1\xe2\u02c6\u2019\widehat{\mathrm{\xce\u02dc}}\xe2\u02c6\u2019{z}_{1\xe2\u02c6\u2019\frac{\mathrm{\xce\pm}}{2}}\sqrt{\widehat{\text{VAR}}\left(\widehat{\mathrm{\xce\u02dc}}\right)},\mathrm{min}\left\{1\xe2\u02c6\u2019\widehat{\mathrm{\xce\u02dc}}+{z}_{1\xe2\u02c6\u2019\frac{\mathrm{\xce\pm}}{2}}\sqrt{\widehat{\text{VAR}}\left(\widehat{\mathrm{\xce\u02dc}}\right)},1\right\}\right].\phantom{\rule{0.1em}{0ex}}\left(4.8\right)
Attributable fraction among the exposed
We use the relationship between the AF_{e} and the RRR to calculate confidence intervals for the AF_{e}. Thus, we convert the formulas for the confidence interval calculation for the RRR given by Lui [3] into the formulas for the AF_{e} by interchanging the risks for exposed and unexposed persons. Using (2.2) and (2.6), we can estimate AF_{e} by
{\widehat{\text{AF}}}_{\text{e}}=1\xe2\u02c6\u2019\widehat{\mathrm{\xce\xa6}}\phantom{\rule{0.1em}{0ex}}\left(4.9\right)
with
\widehat{\mathrm{\xce\xa6}}=\frac{1}{\widehat{\text{RR}}}=\frac{{\widehat{\mathrm{\xcf\u20ac}}}_{0}}{{\widehat{\mathrm{\xcf\u20ac}}}_{1}}.\phantom{\rule{0.1em}{0ex}}\left(4.10\right)
The asymptotic variance estimator of \widehat{\mathrm{\xce\xa6}} is given by
\widehat{\text{VAR}}\left(\widehat{\mathrm{\xce\xa6}}\right)={\widehat{\mathrm{\xce\xa6}}}^{2}\left[\frac{1\xe2\u02c6\u2019{\widehat{\mathrm{\xcf\u20ac}}}_{0}}{{\text{N}}_{0}{\widehat{\mathrm{\xcf\u20ac}}}_{0}}+\frac{1\xe2\u02c6\u2019{\widehat{\mathrm{\xcf\u20ac}}}_{1}}{{\text{N}}_{1}{\widehat{\mathrm{\xcf\u20ac}}}_{1}}\right].\phantom{\rule{0.1em}{0ex}}\left(4.11\right)
Therefore, we can calculate the 100(1Î±)% confidence limits for the AF_{e} by means of
\left[\widehat{{\text{AF}}_{\text{e}}}\xe2\u02c6\u2019{z}_{1\xe2\u02c6\u2019\frac{\mathrm{\xce\pm}}{2}}\sqrt{\widehat{\text{VAR}}\left(\widehat{\mathrm{\xce\xa6}}\right)},\mathrm{min}\left\{\widehat{{\text{AF}}_{\text{e}}}+{z}_{1\xe2\u02c6\u2019\frac{\mathrm{\xce\pm}}{2}}\sqrt{\widehat{\text{VAR}}\left(\widehat{\mathrm{\xce\xa6}}\right)},1\right\}\right].\phantom{\rule{0.1em}{0ex}}\left(4.12\right)
Impact numbers
As EIN, CIN, and ECIN are the reciprocals of the effect measures ARI, PAR, and AF_{e}, we are able to calculate confidence intervals by simply inverting and exchanging the upper (UL) and lower (LL) confidence limits of the corresponding epidemiological effect measure. The 100(1Î±)% confidence limits for the EIN, CIN, and ECIN are therefore given by
\left[\frac{1}{\text{UL}\left(\text{ARI}\right)},\frac{1}{\text{LL}\left(\text{ARI}\right)}\right],\phantom{\rule{0.1em}{0ex}}\left(4.13\right)
\left[\frac{1}{\text{UL}\left(\text{PAR}\right)},\frac{1}{\text{LL}\left(\text{PAR}\right)}\right],\text{and}\phantom{\rule{0.1em}{0ex}}\left(4.14\right)
\left[\frac{1}{\text{UL}\left({\text{AF}}_{\text{e}}\right)},\frac{1}{\text{LL}\left({\text{AF}}_{\text{e}}\right)}\right].\phantom{\rule{0.1em}{0ex}}\left(4.15\right)
All the formulas described above are programmed and computed with SAS 9.1 to use them in practical applications. The SAS programs can be received from the first author by request.
Examples
Example 1: Smoking and coronary heart disease
We consider the data from a British cohort study that investigated the association between smoking and death from CHD [13]. The study included 34440 male doctors who completed a questionnaire about their smoking habits in 1951, and who were subsequently followed up for 20 years (from 11/1951 to 10/1971). This study was also analysed by Heller et al. [4] to illustrate the use and interpretation of impact numbers. They used the published annual death rates for smokers and nonsmokers from this study, assuming a prevalence of smoking in the study population of 30%, and calculated the impact numbers. Our calculations are based on risks for smokers and nonsmokers. We also use the published annual CHD death rates for smokers (Ï€_{1} = 0.669%) and nonsmokers (Ï€_{0} = 0.413%), the sample size from this study (N = 34440), and assume the prevalence of smoking to be 30% to create a hypothetical 2 Ã— 2 table. Table 2 shows the number of respondents, according to whether or not they died from CHD, and whether or not they were smokers.
The results in Table 3 are obtained by applying the methods described above. The CIN is 6.46, i.e. on average, of the 6 to 7 persons who died of CHD, one case was attributable to smoking. The corresponding 95% confidence interval of [3.84, 20.36] indicates a moderate estimation uncertainty for CIN. The ECIN is 2.64, i.e. on average, of the 2 to 3 smoking persons who died of CHD, one case was attributable to smoking. The corresponding 95% confidence interval of [1.76, 5.29] indicates a small estimation uncertainty for ECIN.
Example 2: Smoking and stroke
In a second example we consider the data from the Japan Public Health Centre (JPHC) study on cancer and cardiovascular diseases [14]. This study assessed sexspecific relationships between smoking and risk of stroke in middleaged Japanese men and women. Participants were followed up for 11 years (1990 to 2001). The male cohort included 19782 men; we exclude the exsmokers in order to compare current and neversmokers, and analyse a subgroup of 15337 men. Table 4 presents the respective 2 Ã— 2 table and shows the distribution of absolute numbers of participants analysed.
In this example, the risk for a current smoker and a neversmoker of having a stroke within an 11year period is Ï€_{1} = 420/10519 = 0.0399 and Ï€_{0} = 153/4818 = 0.0318, respectively. We calculate the 95% confidence intervals for the various effect measures (shown in Table 5). The CIN is 6.67, i.e. on average, of the 6 to 7 persons who had a stroke, one case was attributable to smoking. The corresponding 95% confidence interval of [3.80, 27.27] indicates a moderate estimation uncertainty for CIN. The ECIN is 4.89, i.e. on average, of the 5 smoking persons who had a stroke, one case was attributable to smoking. The corresponding 95% confidence interval of [2.86, 16.67] indicates a moderate estimation uncertainty also for ECIN.
Discussion
Some nonstatisticians may have difficulties in interpreting the RD, RR, PAR, or AF_{e}, but may prefer measures, such as the EIN, CIN, or ECIN. Thus, impact numbers may help to communicate study results. Furthermore, the calculation of confidence intervals for impact numbers can add to the interpretation of study results by providing a measure of estimation uncertainty. This is important because estimated impact numbers may be used by policy makers in decisionmaking procedures in health care.
We considered the situation of prospective cohort studies and we used standard methods which are adequate for large sample sizes. For example, we chose an interval estimator using Wald's test statistic proposed by Walter to show the principle of calculation confidence intervals for PAR [3, 15]. There exist more methods to calculate confidence intervals for the PAR, for instance Walter [15, 16] proposed formulas for estimating the variance of the PAR for different study designs. These methods are used in a web page presented by Buchan for point and interval estimation of PAR and RR [17]. Lui [18] compared 5 methods to calculate confidence intervals for PAR and presented an overview of the adequacy of these methods in different situations, for instance varying sample sizes, varying exposure effects, or varying exposure probabilities. The use of one of these alternative methods for interval estimation of PAR should be considered in dependence on the actual study design.
We considered the situation of prospective cohort studies without confounders. The basic principle of inverting and exchanging the confidence limits of the standard effect measures is also applicable to studies investigating confounders or other designs such as casecontrol studies, so long as adequate methods for adjusted point and interval estimation of RD, PAR, and AF_{e} are available.
We assumed a fixed followup time, no persons lost to followup, and no censoring. In the case of varying followup times, more complicated methods based upon survival time techniques have to be developed.
The following limitation of impact numbers should be considered. It may be difficult for users to understand positive and negative values of effect measures. In the case of the risk difference it is possible to switch between ARI and ARR. However, negative results for PAR and AF_{e} are not useful in practice. Thus, in the case of protective exposures, alternative effect measures such as the preventable fraction are applied in practice [7]. This procedure leads to easily interpretable point estimators in practice but does not solve the problem of difficulties with confidence intervals. In the case of statistically nonsignificant results, the lower confidence limits for ARI, PAR, and AF_{e} would be negative. As the point of the zero effect of these three parameters is zero, the "point" of the zero effect for the corresponding impact numbers is infinity. Thus, the confidence intervals for statistically nonsignificant impact numbers consist of two regions, which is hard to understand for users. This issue created a lot of discussion with respect to the presentation of confidence intervals for NNTs. The most satisfactory solution seems to be the proposal of Altman who introduced the additional terminology "number needed to treat for one person to benefit" (NNTB) and "number needed to treat for one person to be harmed" (NNTH) [19]. By using this terminology, confidence intervals for statistically nonsignificant NNTs can be presented as, e.g. "NNTB = 10 (NNTB 4 to âˆž to NNTH 20)", which clearly indicates that the estimation uncertainty is so large that both benefit and harm is compatible with the considered data. This approach was also used for NNEs in epidemiological studies [6]. As EIN is equivalent to NNE, in principle, the same approach is applicable to EINs. The only difficulty is to find a terminology describing benefit and harm for EINs in an intuitive way.
Unfortunately, the approach of extending the name of the effect measure to distinguish between benefit and harm is not applicable to PAR and AF_{e}. As the domain for both measures in the case of protective exposures is the interval ]âˆž, 0 [and in the case of harmful exposures the interval ]0, 1[, the scales describing benefit and harm are different. We consider example 2 for illustration of the problem. If the total sample size of the study would be N = 1534 rather than N = 15337, the effect of smoking would be not significant at the 5% level in the resulting 2 Ã— 2 table. With the same risks for stroke as in Table 4, 42 cases in 1052 smokers and 15 cases in 482 neversmokers are expected. In this table, for example, the result for PAR would be 0.16 with 95% confidence interval of [0.20, 0.52]. By using formula (4.14) the result CIN = 6.1 with 95% confidence interval of [1.9, 5.0] would be obtained. It is important to know that not the values between 5 and 1.9 form the confidence interval for CIN, but the values between 1.9 and âˆž and the values between âˆž and 5. The confidence limits have the following meaning. It is compatible with the observed data that among 2 persons with stroke 1 case is attributable to smoking (harmful exposure) as well as that for each group of 5 persons with stroke 1 additional case will occur if smoking is eliminated from the population (protective exposure). Therefore, the results are interpretable, but the easiness of the impact number is lost. Mathematically, the impact numbers provide no other information than the corresponding classical epidemiological effect measures. The impact numbers are just the reciprocals of the epidemiological effect measures and describe the exposure effect in terms of whole numbers rather than percentages. In the case of statistically nonsignificant study results, the interpretation of the impact numbers is difficult and therefore the goal of presenting the study results in an intuitive way is not reached. Thus, we recommend to use the impact numbers for the presentation of study results in public health research only in the case of studies showing statistically significant exposure effects.
In the situation of statistically nonsignificant study results, just the absolute and relative frequencies should be presented complemented by point and interval estimates of a relation effect measure, which can be interpreted easily in all situations, e.g. the risk ratio. The impact numbers are only useful in the situation of significant exposure effects where it is helpful to describe the effect in different ways.
Conclusion
The calculation of confidence intervals is an essential and fundamental tool to describe the uncertainty of point estimators. This is also valid for impact numbers which help us to communicate the impact of an exposure in the population considered. We showed that it is easy to calculate intervals for the exposure impact number (EIN), the case impact number (CIN), and the exposed cases impact number (ECIN) by making use of existing interval estimation methods for the risk difference (RD), the population attributable risk (PAR), and the attributable fraction among the exposed (AF_{e}). In epidemiological studies demonstrating statistically significant exposure effects, the consideration of impact numbers provides additional information to aid the interpretation of the results of epidemiological studies. In practice, estimated impact numbers should always be complemented by corresponding confidence intervals.
Abbreviations
 AF_{e} :

attributable fraction among the exposed
 ARI:

absolute risk increase
 ARR:

absolute risk reduction
 CIN:

case impact number
 CHD:

coronary heart disease
 ECIN:

exposed cases impact number
 EIN:

exposure impact number
 NNE:

number needed to be exposed
 NNT:

number needed to treat
 NNTB:

number needed to treat for one person to benefit
 NNTH:

number needed to treat for one person to be harmed
 PAR:

population attributable risk (Levin)
 RD:

risk difference
 RR:

relative risk
 RRR:

relative risk reduction
References
Lui KJ: Confidence intervals of the attributable risk under crosssectional sampling with confounders. Biom J. 2001, 43: 767779. 10.1002/15214036(200110)43:6<767::AIDBIMJ767>3.0.CO;2K.
Levin ML: The occurrence of lung cancer in man. Acta Unio Int Contra Cancrum. 1953, 9: 531541.
Lui KJ: Statistical estimation of epidemiological risk. 2004, Chichester: John Wiley & Sons Ltd
Heller RF, Dobson AJ, Attia J, Page J: Impact numbers: measures of risk factor impact on the whole population from casecontrol and cohort studies. J Epidemiol Community Health. 2002, 56: 606610. 10.1136/jech.56.8.606.
Bender R: Number needed to treat (NNT). Encyclopedia of Biostatistics. Edited by: Armitage P, Colton T. 2005, Chichester: John Wiley & Sons, Ltd, 6: 37523761. 2
Bender R, Blettner M: Calculating the "number needed to be exposed" with adjustment for confounding in epidemiological studies. J Clin Epidemiol. 2002, 55: 525530. 10.1016/S08954356(01)005108.
BÃ©nichou J: Attributable risk. Encyclopedia of Biostatistics. Edited by: Armitage P, Colton T. 2005, Chichester: John Wiley & Sons, Ltd, 6: 249262. 2
Wypij D: Binomial Distribution. Encyclopedia of Biostatistics. Edited by: Armitage P, Colton T. 2005, Chichester: John Wiley & Sons, Ltd, 1: 447450. 2
Connor JT, Imrey PB: Proportions, inferences, and comparisons. Encyclopedia of Biostatistics. Edited by: Armitage P, Colton T. 2005, Chichester: John Wiley & Sons, Ltd, 6: 42814294. 2
Newcombe RG: Interval estimation for the difference between independent proportions: comparison of eleven methods. Stat Med. 1998, 17: 873890. 10.1002/(SICI)10970258(19980430)17:8<873::AIDSIM779>3.0.CO;2I.
Wilson EB: Probable Inference, the law of succession, and statistical inference. J Am Stat Assoc. 1927, 22: 209212. 10.2307/2276774.
Cox C: Delta method. Encyclopedia of Biostatistics. Edited by: Armitage P, Colton T. 2005, Chichester: John Wiley & Sons, Ltd, 6: 14091411. 2
Doll R, Peto R: Mortality in relation to smoking: 20 years' observations on male British doctors. BMJ. 1976, 2: 15251536.
Mannami T, Iso H, Baba S, Sasaki S, Okada K, Konishi M, Tsugane S: Cigarette smoking and risk of stroke and its subtypes among middleaged Japanese men and women â€“ The JPHC study cohort I. Stroke. 2004, 35: 12481253. 10.1161/01.STR.0000128794.30660.e8.
Walter SD: The estimation and interpretation of attributable risk in health research. Biometrics. 1976, 32: 829849. 10.2307/2529268.
Walter SD: Calculation of attributable risks from epidemiological data. Int J Epidemiol. 1978, 7: 175182.
Relative risk and risk difference confidence intervals. [http://www.phsim.man.ac.uk/risk/]
Lui KJ: Notes on interval estimation of the attributable risk in crosssectional sampling. Stat Med. 2001, 20: 17971809. 10.1002/sim.777.
Altman DG: Confidence intervals for the number needed to treat. BMJ. 1998, 317: 13091312.
Prepublication history
The prepublication history for this paper can be accessed here:http://www.biomedcentral.com/14712288/6/32/prepub
Acknowledgements
This work is supported by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG), Grant B1 443/51. We thank Natalie McGauran for editorial support.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The author(s) declare that they have no competing interests.
Authors' contributions
MH, RB, and MB contributed to the design and analyses. MH wrote the initial draft of the manuscript. MH and UG performed all calculations. All authors contributed to the manuscript preparation, read, and approved the final manuscript.
Rights and permissions
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Hildebrandt, M., Bender, R., Gehrmann, U. et al. Calculating confidence intervals for impact numbers. BMC Med Res Methodol 6, 32 (2006). https://doi.org/10.1186/14712288632
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/14712288632
Comments
View archived comments (3)