Evaluation of exposurespecific risks from two independent samples: A simulation study
 William M Reichmann^{1, 2}Email author,
 David Gagnon^{2, 4},
 C Robert Horsburgh^{3} and
 Elena Losina^{1, 2}
DOI: 10.1186/14712288111
© Reichmann et al; licensee BioMed Central Ltd. 2011
Received: 14 May 2010
Accepted: 5 January 2011
Published: 5 January 2011
Abstract
Background
Previous studies have proposed a simple productbased estimator for calculating exposurespecific risks (ESR), but the methodology has not been rigorously evaluated. The goal of our study was to evaluate the existing methodology for calculating the ESR, propose an improved point estimator, and propose variance estimates that will allow the calculation of confidence intervals (CIs).
Methods
We conducted a simulation study to test the performance of two estimators and their associated confidence intervals: 1) current (simple productbased estimator) and 2) proposed revision (revised productbased estimator). The first method for ESR estimation was based on multiplying a relative risk (RR) of disease given a certain exposure by an overall risk of disease. The second method, which is proposed in this paper, was based on estimates of the risk of disease in the unexposed. We then multiply the updated risk by the RR to get the revised productbased estimator. A logbased variance was calculated for both estimators. Also, a binomialbased variance was calculated for the revised productbased estimator. 95% CIs were calculated based on these variance estimates. Accuracy of point estimators was evaluated by comparing observed relative bias (percent deviation from the true estimate). Interval estimators were evaluated by coverage probabilities and expected length of the 95% CI, given coverage. We evaluated these estimators across a wide range of exposure probabilities, disease probabilities, relative risks, and sample sizes.
Results
We observed more bias and lower coverage probability when using the existing methodology. The revised productbased point estimator exhibited little observed relative bias (max: 4.0%) compared to the simple productbased estimator (max: 93.9%). Because the simple productbased estimator was biased, 95% CIs around this estimate exhibited small coverage probabilities. The 95% CI around the revised productbased estimator from the logbased variance provided better coverage in most situations.
Conclusion
The currently accepted simple productbased method was only a reasonable approach when the exposure probability is small (< 0.05) and the RR is ≤ 3.0. The revised productbased estimator provides much improved accuracy.
Background
Exposurespecific risk (ESR) is defined as the risk of disease (or any outcome) given a specific exposure (or subgroup). ESRs are useful to clinicians because it allows a much more meaningful way of explaining risk to patients. They are also useful to investigators who are looking to use ESRs for their own work, which may include publishing their own work or planning studies. In the absence of having access to the primary data or a reported estimate of the ESR in the literature, the ESR can be estimated from two independent samples if the investigator knows the overall risk of disease and the relative risk (RR) of disease given the exposure of interest. There have been a number of published studies where ESRs have been calculated from two independent samples by multiplying the overall risk of disease from one sample by the RR from a second independent sample [1, 2]. Stewart et al. computed the ESR of hip fracture given certain exposures (prior fracture, family history of fracture, low body weight, and smoking) in persons over the age of 70 in the United Kingdom [2]. This study found that the ESR of hip fracture among those with all 4 exposures was 8.9%. This was done by multiplying an overall risk of hip fracture of 1.91% by a RR of 4.66 [2].
Horsburgh computed the ESR of tuberculosis for multiple risk factors, along with 95% confidence intervals (CIs). The upper (lower) bound of the 95% CI for the ESR was calculated by multiplying the upper (lower) bound of the 95% CI for the overall risk by the upper (lower) bound of the 95% CI for the RR [1]. While there has been some work addressing the multiplication of two binomial parameters [3], to the best of our knowledge, there are no methodological articles evaluating the properties of the simple productbased estimator that was used in the articles by Stewart et al and Horsburgh.
In this article we set to address three objectives. The first is to evaluate the properties of the simple productbased estimator of the ESR used by Stewart et al and Horsburgh. The second objective is to propose an estimate of the variance of the ESR, which can subsequently be used for calculating 95% CIs. Lastly, we propose a revised productbased estimator and two variances estimates for the revised point estimator which are used to calculate 95% CIs.
Methods
Overview
We designed and implemented a simulation study to examine the properties of two different estimators of the ESR and their 95% CIs. The two estimators we sought to evaluate (and their associated CIs) were a simple productbased estimator and revised productbased estimator. Point estimators were evaluated by calculating the observed relative bias. Their 95% CIs were evaluated using coverage probabilities and expected length given coverage for a wide range of parameters, including exposure probability, probability of disease among the unexposed, the RR of disease given exposure, and the sample size.
To estimate the ESR, this formula needs an estimate of the risk of disease in the unexposed (${\text{P}}_{\text{1}}\text{(D}\overline{\text{E}})$) rather than the estimate of the overall risk of disease (P_{1}(D)).
Simple ProductBased ESR
Variance and Confidence Interval for the Simple ProductBased ESR
Revised ProductBased ESR
Variance and Confidence Interval for the Revised ProductBased ESR
Thus a 95% CI for ESR_{R} can be constructed using the normal approximation shown in equation 6 by substituting ESR_{R} for ESR_{S} and Var(ln(ESR_{R})) for Var(ln(ESR_{S})).
Simulation study details
All simulations and subsequent evaluations were performed using SAS statistical software, version 9.2 (SAS, Cary, NC). Populations of size 10 million were generated based on different exposure probabilities, probabilities of disease among the unexposed, and RRs of disease given the exposure. One thousand pairs of samples were drawn from the population to determine the sampling distribution of the overall probability of disease and the RR of disease given exposure. After the samples were generated, estimates of the RR and overall probability of disease (along with their 95% CIs) were calculated for each sample.
Parameters varied and all their possible values for the simulation study
Parameter  Possible values 

Exposure probability  .05, .20 
Probability of disease among unexposed  .02, .09 
RR  1.0, 1.5, 2.0, 2.5, 3.0, 4.0, 5.0 
Sample size combinations for the overall risk and RR (N_{1}/N_{2})  250/250*, 1,000/1,000, 1,000/5,000, 5,000/1,000, 5,000/5,000 
Evaluation of ESR Estimators
We calculated the estimated ESR using the simple productbased method and revised productbased method for each of the 1,000 pairs of samples. We evaluated the estimators using observed relative bias. Observed relative bias was defined as the difference between the average of the 1,000 estimates from the 1,000 pairs of samples and the assumed population ESR divided by the assumed population ESR. Observed relative bias can be described as the percent change from the true estimate.
Evaluation of Confidence Intervals
All 95% CIs were evaluated using coverage probabilities. The coverage probability is defined as the probability that the interval covers the assumed population ESR. For each of the 1,000 pairs of samples we determine whether the assumed population ESR falls between the lower and upper bounds of the CI. The coverage probability is then determined by the number of times the interval covered divided by 1,000. Since we calculated 95% CIs, we expect that our intervals would cover 950 times out of 1,000 (95%).
Expected length given coverage was also evaluated for all of our 95% CIs. For every 95% CI that covered the true value of the ESR for a given pair of 1,000 samples, the length was calculated by subtracting the lower bound from the upper bound. We then calculated the average of these lengths to get the expected length given coverage. For example, if the coverage probability was 95.1% then 951 out of 1,000 intervals covered the true value of the ESR. Therefore the expected length given coverage is based on an N of 951. For the purpose of comparison, we also calculated the empirical 95% CI and its length. This was done by examining the distribution of the direct estimator and taking the 2.5^{th} percentile to be the lower bound of the 95% CI and the 97.5^{th} percentile to be the upper bound of the 95% CI. The length of the empirical 95% CI was calculated by subtracting the 2.5^{th} percentile from the 97.5^{th} percentile.
Case Study
We tested our methodology using a case study in which we calculated the risk of symptomatic knee osteoarthritis (OA) in obese persons by age groups. The overall risk of symptomatic knee OA by age group was derived from Oliveria et al [4]. This article reports on one of the largest populationbased studies that estimates the risk of symptomatic knee OA with a cohort of more than 130,000 members of a community health plan. The relative risk of symptomatic knee OA for obese persons (1.91) and proportion obese (0.371) was derived from Niu et al [5]. This study provides one of the most current estimates of the relative risk of symptomatic knee OA by obesity status and also had a substantial sample size (N = 2,660). Since the study by Niu and colleagues only studied those ages 5079, we limited our analysis to those ages 5059, 6069, and 7079.
Results
Scenario 1: Low exposure probability (.05)/Low disease probability among unexposed (.02)
Observed relative bias for the simple productbased estimator (ESRS) and the revised productbased estimator (ESRR)
Low exposure probability (.05)/Low disease probability in unexposed (.02)  

N_{1} = 1,000, N_{2} = 1,000  N_{1} = 5,000, N_{2} = 5,000  
RR/ESR  ESR_{S}  ESR_{R}  ESR_{S}  ESR_{R} 
1.0/.02  9.5%  3.9%  1.4%  2.3% 
2.0/.04  7.8%  2.6%  4.5%  1.4% 
3.0/.06  18.0%  1.8%  12.2%  0.8% 
4.0/.08  21.1%  0.0%  17.6%  1.1% 
5.0/.10  31.4%  3.4%  22.6%  1.0% 
Low exposure probability (.05)/Moderate disease probability in unexposed (.09)  
N_{1} = 1,000, N_{2} = 1,000  N_{1} = 5,000, N_{2} = 5,000  
RR/ESR  ESR_{S}  ESR_{R}  ESR_{S}  ESR_{R} 
1.0/.09  0.1%  0.9%  0.9%  0.6% 
2.0/.18  6.1%  0.1%  5.5%  0.3% 
3.0/.27  9.2%  1.4%  10.4%  0.2% 
4.0/.36  16.6%  0.4%  15.9%  0.5% 
5.0/.45  22.0%  0.7%  21.2%  0.8% 
High exposure probability (.20)/Low disease probability in unexposed (.02)  
N_{1} = 1,000, N_{2} = 1,000  N_{1} = 5,000, N_{2} = 5,000  
RR/ESR  ESR_{S}  ESR_{R}  ESR_{S}  ESR_{R} 
1.0/.02  8.7%  1.1%  0.2%  1.3% 
2.0/.04  26.3%  1.4%  21.4%  0.1% 
3.0/.06  45.0%  1.8%  41.7%  0.1% 
4.0/.08  73.3%  1.2%  61.6%  0.1% 
5.0/.10  93.9%  0.3%  82.5%  0.0% 
High exposure probability (.20)/Moderate disease probability in unexposed (.09)  
N_{1} = 1,000, N_{2} = 1,000  N_{1} = 5,000, N_{2} = 5,000  
RR/ESR  ESR_{S}  ESR_{R}  ESR_{S}  ESR_{R} 
1.0/.09  0.8%  0.4%  1.1%  1.0% 
2.0/.18  22.6%  0.7%  20.4%  0.2% 
3.0/.27  40.9%  0.2%  40.3%  0.0% 
4.0/.36  63.6%  0.7%  60.6%  0.1% 
5.0/.45  82.6%  0.2%  81.0%  0.2% 
Coverage probability for the 95% confidence interval of the simple productbased estimator (ESRS) and revised productbased estimator (ESRR) using a logbased variance
Low exposure probability (.05)/Low disease probability in unexposed (.02)  

N_{1} = 1,000, N_{2} = 1,000  N_{1} = 5,000, N_{2} = 5,000  
RR/ESR  ESR_{S}  ESR_{R}  ESR_{S}  ESR_{R} 
1.0/.02  96.8  97.3  97.5  97.5 
2.0/.04  96.4  98.1  95.9  97.2 
3.0/.06  95.7  98.3  92.7  96.6 
4.0/.08  94.1  98.2  90.3  98.0 
5.0/.10  93.9  98.4  87.1  97.7 
Low exposure probability (.05)/Moderate disease probability in unexposed (.09)  
N_{1} = 1,000, N_{2} = 1,000  N_{1} = 5,000, N_{2} = 5,000  
RR/ESR  ESR_{S}  ESR_{R}  ESR_{S}  ESR_{R} 
1.0/.09  96.8  97.2  95.9  96.2 
2.0/.18  95.8  96.8  93.4  96.3 
3.0/.27  94.3  97.0  87.8  96.4 
4.0/.36  89.3  96.1  70.8  97.0 
5.0/.45  83.2  96.6  45.0  96.6 
High exposure probability (.20)/Low disease probability in unexposed (.02)  
N_{1} = 1,000, N_{2} = 1,000  N_{1} = 5,000, N_{2} = 5,000  
RR/ESR  ESR_{S}  ESR_{R}  ESR_{S}  ESR_{R} 
1.0/.02  97.6  98.5  94.5  96.6 
2.0/.04  94.2  98.9  85.6  98.0 
3.0/.06  89.5  99.0  55.1  98.6 
4.0/.08  76.8  99.2  22.3  99.4 
5.0/.10  65.4  99.6  4.2  99.4 
High exposure probability (.20)/Moderate disease probability in unexposed (.09)  
N_{1} = 1,000, N_{2} = 1,000  N_{1} = 5,000, N_{2} = 5,000  
RR/ESR  ESR_{S}  ESR_{R}  ESR_{S}  ESR_{R} 
1.0/.09  94.7  96.8  94.7  96.5 
2.0/.18  85.4  98.5  51.1  98.0 
3.0/.27  55.9  98.5  1.8  98.6 
4.0/.36  17.6  99.4  0  99.4 
5.0/.45  2.9  99.5  0  99.4 
Coverage probability of the 95% confidence interval for the revised productbased estimator (ESRR) using a binomial variance
Low exposure probability (.05)/Low disease probability in unexposed (.02)  

RR/ESR  
Sample Size (N_{1}/N_{2})  1.0/.02  1.5/.03  2.0/.04  2.5/.05  3.0/.06  4.0/.08  5.0/.10 
1,000/1,000  66.1  78.1  83.2  87.1  89.6  88.9  90.0 
1,000/5,000  98.9  99.3  99.6  99.7  99.9  99.5  99.5 
5,000/1,000  54.4  60.6  61.2  62.3  63.5  64.9  65.0 
5,000/5,000  90.0  92.8  92.8  93.7  92.4  94.4  94.8 
Low exposure probability (.05)/Moderate disease probability in unexposed (.09)  
RR/ESR  
Sample Size (N_{1}/N_{2})  1.0/.09  1.5/.14  2.0/.18  2.5/.23  3.0/.27  4.0/.36  5.0/.45 
1,000/1,000  90.4  91.6  93.2  92.4  92.2  92.0  91.5 
1,000/5,000  99.9  100  100  100  99.9  99.7  99.3 
5,000/1,000  62.8  63.8  63.9  63.7  62.1  61.6  62.7 
5,000/5,000  93.8  93.4  94.2  93.5  93.7  93.6  91.6 
High exposure probability (.20)/Low disease probability in unexposed (.02)  
RR/ESR  
Sample Size (N_{1}/N_{2})  1.0/.02  1.5/.03  2.0/.04  2.5/.05  3.0/.06  4.0/.08  5.0/.10 
1,000/1,000  89.1  91.2  91.8  92.6  91.5  92.5  93.8 
1,000/5,000  99.7  99.1  99.4  99.0  98.0  98.8  98.1 
5,000/1,000  63.9  68.0  64.7  69.3  70.7  70.5  71.9 
5,000/5,000  92.1  94.0  95.1  94.8  94.3  94.5  93.5 
High exposure probability (.20)/Moderate disease probability in unexposed (.09)  
RR/ESR  
Sample Size (N_{1}/N_{2})  1.0/.09  1.5/.14  2.0/.18  2.5/.23  3.0/.27  4.0/.36  5.0/.45 
1,000/1,000  92.5  93.2  95.1  94.5  92.6  90.7  86.7 
1,000/5,000  99.9  99.7  98.9  98.8  98.2  97.3  95.5 
5,000/1,000  66.0  68.3  69.7  69.0  64.4  66.6  66.0 
5,000/5,000  94.0  94.0  93.2  94.0  91.9  92.0  88.9 
Scenario 2: Low exposure probability (.05)/Moderate disease probability among unexposed (.09)
Increasing the probability of disease among the unexposed from .02 to .09 while keeping the exposure probability set to .05 did not drastically change our results. The observed relative bias of ESR_{S} still increased as the magnitude of the RR increased. When the RR was 5.0, the observed relative bias of ESR_{S} was greater than 20% for all sample size combinations. The observed relative bias of ESR_{R} was close to zero for all combinations of RR and sample size (Table 2).
Coverage probabilities for the 95% CI of ESR_{S} were less than 95% in most cases. The coverage probabilities were adversely affected by the increasing magnitude of the RR with a minimum coverage probability of 45% attained when the RR was 5.0 and the sample size was 5,000 for both samples (Table 3). Similar to Scenario 1, coverage probabilities for the 95% CI of ESR_{R} using a logbased variance exhibited at least 95% coverage in all cases except when the sample size the overall risk was derived from was 1,000 and the sample size the RR was derived from was 5,000 (see additional file 1). The 95% CI for ESR_{R} using a binomial variance showed the exact opposite relationship. Regardless of the magnitude of the RR, the coverage probability of the 95% CI for ESR_{R} using a binomial variance was greater than 99% when the sample size the overall risk was derived from was 1,000 and the sample size the RR was derived from was 5,000 (Table 4).
Scenario 3: High exposure probability (.20)/Low disease probability among unexposed (.02)
Increasing the exposure probability from .05 to .20 while the probability of disease among the unexposed was .02 affected the results substantially for the existing methodology. The observed relative bias of ESR_{S} was over 10% when the RR was 1.5, over 20% when the RR was 2.0, and over 80% when the RR was 5.0. However, the observed relative bias of ESR_{R} was near 0% with the greatest observed relative bias being 1.8% when the RR was 3.0 and both sample sizes were 1,000 (Table 2).
In terms of coverage probability, the 95% CI for ESR_{S} attained 95% coverage only when the RR was small. When the RR was 5.0, the 95% CI for ESR_{S} had a coverage probability as low as 4.2% when the sample size was 5,000 for both samples. Similar to the previous two analyses, coverage probabilities for the 95% CI of ESR_{R} using a logbased variance exhibited at least 95% coverage in all cases except when the sample size the overall risk was derived from was 1,000 and the sample size the RR was derived from was 5,000 (see additional file 1). The 95% CI for ESR_{R} using a binomial variance showed the exact opposite relationship. Regardless of the magnitude of the RR, the coverage probability of the 95% CI for ESR_{R} using a binomial variance was greater than 99% when the sample size the overall risk was derived from was 1,000 and the sample size the RR was derived from was 5,000 (Table 4).
Scenario 4: High exposure probability (.20)/Moderate disease probability among unexposed (.09)
In Scenario 4, we also evaluated the properties of our estimator when the sample size was 250 for both samples. We observed similar relationships in terms of observed relative bias and coverage probabilities. The observed relative bias of ESR_{S} was 7.8% when the RR was 1.0 and 84.9% when the RR was 5.0, while the observed relative bias of ESR_{R} ranged between 1.1% and 1.1%. The coverage probability of the 95% CI for ESR_{S} was 96.8% when the RR was 1.0 but fell below 95% when the RR was 1.5 (92.3%) and decreased substantially for a RR of 5.0 (56.8%). The coverage probability of the 95% CI for ESR_{R} using a logbased variance was greater than 95% for all RRs. The coverage probability of the 95% CI for ESR_{R} using a binomial variance ranged between 85.7 and 92.5%. In terms of expected length given coverage, the 95% CI for ESR_{R} using a binomial variance provided shorter intervals and were closer to the length of the empirical interval than the 95% CI using a logbased variance.
Results of the case study
Results from the case study on the risk of symptomatic knee OA in obese persons
Age  Overall risk of symptomatic knee OA in the Oliveria study  Risk of symptomatic knee OA for obese persons using the simple productbased method  95% CI for ESR_{S} using a logbased variance  Risk of symptomatic knee OA for obese persons using the revised productbased method  95% CI for ESR_{R} using a logbased variance  95% CI for ESR_{R} using a binomial variance 

5059  0.0040  0.0076  0.00520.0110  0.0057  0.00380.0085  0.00370.0077 
6069  0.0087  0.0167  0.01210.0230  0.0125  0.00890.0175  0.00950.0155 
7079  0.0147  0.0282  0.02070.0383  0.0211  0.01530.0289  0.01680.0253 
Discussion
Recall that for the productbased estimator of the ESR to be unbiased that what we really need is an estimate of the risk of disease in the unexposed and not the overall risk. When the exposure probability is low, less weight is put on the probability of disease among the exposed. Put this together with a small RR and most of the overall risk of disease is being influenced by those who are unexposed. However, increasing the exposure probability puts more weight on the risk of disease among the exposed, which will give you a much more biased estimate of the risk of disease among the unexposed. We also showed that ESR_{R} provides a substantial improvement over the ESR_{S} in terms of observed relative bias. We found that the observed relative bias of ESR_{R} was near 0% in almost all cases.
Coverage probabilities for the 95% CI for ESR_{S} were inversely related to the observed relative bias of ESR_{S}. As the observed relative bias increased, the coverage probability decreased. The overestimation of the ESR using existing methodology (ESR_{S}) led to 95% CIs that were less likely to cover the true ESR. Also, the expected lengths given coverage for these 95% CIs were usually longer than the lengths produced for ESR_{R} using either the logbased variance or the binomial variance rendering this method of point and interval estimation to be suboptimal.
Coverage probabilities for the 95% CI for ESR_{R} using a logbased variance exhibited greater than 95% coverage in most cases. The exception was when the sample size for the overall risk was 1,000 and the sample size for the RR was 5,000. Paradoxically, this was the only situation in which the 95% CI of ESR_{R} using a binomial variance exhibited greater than 95% coverage. In terms of expected length given coverage, neither of these two methods of interval estimation of ESR_{R} performed better than the other in all situations. The coverage probability and expected length given coverage depended on the variance estimate that was employed. From equation 11, we can see that the logbased variance of ESR_{R} took into account variability from the overall risk and the RR. We also assumed that the two measures were independent and had a covariance of zero, which is a reasonable assumption because the two measures come from two independent samples. From equation 12, we can see that the binomial variance of ESR_{R} probability of exposure from sample 2 so that the variance would not be underestimated. However, in most cases the variability still was underestimated. When the sample sizes were equal, the underestimation was very little since the coverage probabilities ranged from 87%95% in most cases. However in Scenario 1, when the sample size combination was 1,000/1,000 and the RR was 1.0, 1.5, and 2.0 the coverage probabilities were 66%, 78%, and 83% respectively.
The four scenarios, which were defined by the combinations of two different exposure probabilities (.05 and .20) and two different probabilities of disease in the unexposed (.02 and .09), did not affect the observed relative bias of ESR_{R}. However, as we increased these two parameters, the observed relative bias of ESR_{S} increased. This phenomenon was also demonstrated when comparing coverage probabilities based on the logbased variance for ESR_{R} and ESR_{S}. When comparing coverage probabilities based on the binomial variance for ESR_{R}, the scenario does matter with larger values of the probability of exposure and/or probability of disease in the unexposed increased coverage probabilities. This is not surprising because the estimate of the binomial variance will increase with increasing exposure probabilities and increasing probability of disease among the unexposed.
Results from our case study most closely resemble scenario two where the magnitude of the RR is 2.0. In scenario two, we assumed an exposure probability of 0.20 and a probability of disease in the unexposed of .02. In our case study the RR was 1.91, the exposure probability (probability of being obese) was 0.371, and the overall risk of disease (symptomatic knee OA) ranged from 0.0087 to 0.0132. While the simulations suggest that the estimator would be biased, the overall risk of disease is small so the difference between the two estimates in absolute terms is not large with the largest overestimation occurring in those ages 7079 by 0.71%.
It is likely that the estimates produced by Horsburgh and Stewart et al. were accurate. In the article by Horsburgh et al on tuberculosis, he estimated the ESR of tuberculosis for those with advanced HIV infection; old, healed tuberculosis; and immunosuppressive therapy[1]. While the RR of obtaining a new case of tuberculosis is high for those with advanced HIV infection and old, healed tuberculosis, the probability of exposure is so low for these exposures that the impact of the large RR would be muted. For those with immunosuppressive therapy, the RR of a new case of tuberculosis is modest (2.0) and the probability of exposure is low so the overall probability of disease is a good estimate of the probability of disease among those who are not on immunosuppressive therapy [1]. In the Stewart article, the largest RR is 4.62, but this corresponds to an exposure probability of 0.001. When the exposure probabilities are large enough to possibly impact the estimate of the ESR, the RR is low enough (< 2.0) to offset the possible bias [2].
An article by Cupples et al. calculated risk curves for firstdegree relatives of patients with Alzheimer's disease. Their method used the odds ratio instead of the relative risk and included converting probabilities to odds [6]. Our method will allow clinicians and other researchers to find the ESR in one step, provided the summary statistics needed for the calculation (P_{1}(D), RR_{2}, and P_{2}(E)) are available.
We acknowledge that there are limitations with this study. The first is that simulation studies can not be considered a proof. However, we did show mathematically that the proposed estimator of the ESR is unbiased and the results of our simulation confirm this finding. It would be important to show mathematically what the true coverage probabilities are for our 95% CIs across different RRs, exposure probabilities, and probabilities of disease among the unexposed. We also acknowledge that our simulations showed coverage probabilities that well exceed 95% when we are calculating 95% CIs for ESR_{R} using a logbased variance.
We also evaluated the properties of our point and interval estimators when the sample size was small. We observed that one should only consider carrying out these calculations in smaller samples if the prevalence of exposure and disease among the unexposed is sufficiently large. If one of these values is small than the validity of the estimate of the RR may be questionable. Thus, we recommend that investigators using this methodology only use estimates that are of the highest quality.
The implications of our study are substantial. Clinicians can use these estimates to better explain risk of disease to patients. Many times clinicians and patients can misinterpret the meaning of having a certain RR of disease. Interpreting the probability of disease given a certain exposure (the ESR) is much more transparent. Future studies that examine the calculation of ESRs may look at the impact of having the odds ratio (OR) rather than the RR. Also, the consideration of under which study designs and magnitudes of the exposure/disease would an approximation using the OR be valid is an important question to answer. It is likely that the OR would be valid when the prevalence of the outcome is less than 10% but examining this rigorously would be of great importance [7]. Lastly, resampling and bootstrapping techniques may be a useful method of obtaining CIs with appropriate coverage.
Conclusions
We developed a new estimator for the ESR from two independent samples that exhibits more desirable properties with respect to bias and coverage than the existing methodology. The existing methodology will still perform well when the exposure probability is low. Future methodological studies should focus on the impact of ORs and resampling techniques.
List of abbreviations
 ESR:

Exposure specific risk
 RR:

Relative risk
 CI:

Confidence interval
 D:

Disease
 $\overline{\text{D}}$ :

Without disease
 E:

Exposure
 $\overline{\text{E}}$ :

Without exposure
 P():

Probability of
 Var:

Variance
 Cov:

Covariance
 exp():

exponential function
 OA:

osteoarthritis.
Declarations
Acknowledgements
Grant support: This research was supported in part by the National Institutes of Health, National Institute of Arthritis and Musculoskeletal and Skin Diseases grants T32 AR055885 and K24 AR057827.
Authors’ Affiliations
References
 Horsburgh CR: Priorities for the treatment of latent tuberculosis infection in the United States. N Engl J Med. 2004, 350 (20): 20602067. 10.1056/NEJMsa031667.View ArticlePubMed
 Stewart A, Calder LD, Torgerson DJ, Seymour DG, Ritchie LD, Iglesias CP, Reid DM: Prevalence of hip fracture risk factors in women aged 70 years and over. QJM. 2000, 93 (10): 677680. 10.1093/qjmed/93.10.677.View ArticlePubMed
 Buehler RJ: Confidence Intervals for the Product of Two Binomial Parameters. J Am Stat Assoc. 1957, 52: 48293. 10.2307/2281697.View Article
 Oliveria SA, Felson DT, Reed JI, Cirillo PA, Walker AM: Incidence of symptomatic hand, hip, and knee osteoarthritis among patients in a health maintenance organization. Arthritis Rheum. 1995, 38 (8): 11341141. 10.1002/art.1780380817.View ArticlePubMed
 Niu J, Zhang YQ, Torner J, Nevitt M, Lewis CE, Aliabadi P, Sack B, Clancy M, Sharma L, Felson DT: Is obesity a risk factor for progressive radiographic knee osteoarthritis?. Arthritis Rheum. 2009, 61 (3): 329335. 10.1002/art.24337.PubMed CentralView ArticlePubMed
 Cupples LA, Farrer LA, Sadovnick AD, Relkin N, Whitehouse P, Green RC: Estimating risk curves for firstdegree relatives of patients with Alzheimer's disease: the REVEAL study. Genet Med. 2004, 6 (4): 192196. 10.1097/01.GIM.0000132679.92238.58.View ArticlePubMed
 Zhang J, Yu KF: What's the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes. JAMA. 1998, 280 (19): 16901691. 10.1001/jama.280.19.1690.View ArticlePubMed
 The prepublication history for this paper can be accessed here:http://www.biomedcentral.com/14712288/11/1/prepub
Prepublication history
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.