Estimating uncertainty of alcohol-attributable fractions for infectious and chronic diseases

Background Alcohol is a major risk factor for burden of disease and injuries globally. This paper presents a systematic method to compute the 95% confidence intervals of alcohol-attributable fractions (AAFs) with exposure and risk relations stemming from different sources. Methods The computation was based on previous work done on modelling drinking prevalence using the gamma distribution and the inherent properties of this distribution. The Monte Carlo approach was applied to derive the variance for each AAF by generating random sets of all the parameters. A large number of random samples were thus created for each AAF to estimate variances. The derivation of the distributions of the different parameters is presented as well as sensitivity analyses which give an estimation of the number of samples required to determine the variance with predetermined precision, and to determine which parameter had the most impact on the variance of the AAFs. Results The analysis of the five Asian regions showed that 150 000 samples gave a sufficiently accurate estimation of the 95% confidence intervals for each disease. The relative risk functions accounted for most of the variance in the majority of cases. Conclusions Within reasonable computation time, the method yielded very accurate values for variances of AAFs.


Background
Alcohol consumption is a major risk factor for burden of disease and injuries globally [1,2] as demonstrated by the Comparative Risk Analyses (CRA) within the Global Burden of Disease and Injury (GBD) Studies [2,3]. To estimate the impact of alcohol consumption on infectious and chronic diseases, alcohol attributable fractions (AAFs) are calculated [4] and applied to the number of deaths or number of incident cases [5].
Up until now, confidence intervals (CIs) have not been presented in the CRA for the estimates of alcohol-attributable health harms. While there are methods to calculate uncertainty around AAFs when both exposure and risk relations are derived from the same cohort [6,7], no such methods exist for the case where both exposure and risk relations stem from two different meta-analyses (for general concerns and considerations see [5,8,9] and the Discussion section below). This article aims to fill this gap, and for the first time will present a method to calculate CIs for the new AAFs modelling methodology used in the 2005 CRA study for chronic diseases by region, sex and age (see [4] for a description of the AAF modelling methodology and [10] for a comparison of the new AAF modelling methodology with previous methods.) Alcohol is related to many disease categories [5]. Since globally morbidity and mortality can only be reliably estimated for broad disease or injury categories, the GBD is restricted to 126 distinct broad disease or injury categories http://www.globalburden.org/GBD_Study_Operations_ Manual_Jan_20_2009.pdf, of which 31 are causally related to alcohol [5]. We will first be using exposure measures and relative risks for disease categories from the 2005 CRA study for which a meta-analysis providing a continuous relative risk function exists to estimate AAFs [5], and then will explain the methodology to construct CIs for these AAFs. This paper will focus on the Asian regions as an illustration of our results. Asia presents an interesting mix of low income and high income regions and allows us to illustrate succinctly our methodology.

Methods
This method has two main steps: (1) calculation of the AAFs, and (2) calculation of the variance for the AAFs. Information from multiple sources, all of which carries a certain degree of uncertainty, is required in order to calculate the AAFs. This information is outlined below.

Definition of regions
Regions were defined in accordance with the 2005 GBD study [11]. Countries were grouped into regions which were defined by their geographical location and epidemiological profile which includes child and adult mortality levels and major causes of death. Neither income nor population of the countries in a region had an impact on the grouping. For the purpose of illustrating the method, we restricted the analysis to the five Asian regions containing the countries listed below:

Measures of alcohol consumption
Adult per capita consumption is calculated by adding together the estimated recorded and unrecorded alcohol consumption [13,14]. The variance of an estimate of recorded consumption was based on estimates from different sources (for example, government data, industry data, Food and Agriculture Organization), which are usually quite similar. The main sources for determining unrecorded consumption are home production, alcohol intended for industrial, technical, and medical uses, and illegal production or importation of alcohol. The variance of an estimate of unrecorded consumption is larger in comparison to that of recorded consumption and there are usually only sparse sources for information on unrecorded consumption which is often based on limited empirical evidence [14,15]. Since uncertainty of unrecorded adult per capita consumption is not provided in the 2005 CRA study, we assumed the standard deviation of unrecorded adult per capita consumption was proportionally five times larger than the standard deviation for recorded adult per capita consumption. The prevalence of lifetime abstainers and former drinkers was estimated from a population-weighted average of surveys in the respective regions by sex and age.
Using the proportion of current drinkers we calculated the per capita consumption of alcohol per current drinker, which was used in modelling alcohol consumption. The variance of prevalence can be estimated using a binomial distribution, as illustrated below in the Statistical procedures section.

Modelling alcohol consumption
Using comparable studies, involving 1001 distributions from 66 countries by sex and age, it can be shown that the distribution of alcohol consumption for the drinking population is modelled best using the gamma distribution [4]. It is well known that population surveys underestimate true consumption, and thus data from surveys have to be triangulated with estimates of adult per capita consumption, which are often based on sales data [4,13]. To be conservative, we assumed that 80% of this registry-based estimate reflected the true adult per capita consumption; this level was chosen to account for the alcohol wasted and not consumed (for example, broken bottles and quantities left over in glasses) and for the underestimation of true consumption in medical epidemiological studies, which were used in the metaanalyses that estimated the relative risk functions. A regression of the above-mentioned studies showed a strong relationship between mean and standard deviation (for men and women, the explained variance of standard deviation was greater than 90%). This relationship allows us to compute the standard deviation of an upshifted distribution very easily. Finally, this method relies on the assumption that the proportion of alcohol consumed by the various sex and age groups derived from surveys is accurate [4].

Measures of relative risk
The relative risk functions for each chronic disease were derived using a series of meta-analyses which used fractional polynomial regression [5] separated by sex and, where possible, by morbidity and mortality (e.g. liver cirrhosis [16] or stroke [17]). The coefficients of each polynomial representing the relative risk function are called beta-coefficients. The uncertainty of the relative risk beta-coefficients is expressed by a covariance matrix (obtained from the meta-analyses). Of the diseases with which alcohol is associated, these associations take one of three forms: 1) exponential, 2) linear, or 3) J-shaped. Figure 1 provides a plot of the relative risk for liver cirrhosis for men as an example of a disease having an exponential relationship with alcohol consumption, figure 2 is a plot of the relative risk for hemorrhagic stroke (mortality) for men which has a linear association with alcohol consumption, and figure 3 outlines the relative risk for coronary heart disease for men as an example of a disease having a j-shaped relationship with alcohol consumption.
Step 1: Calculation of the AAF This step requires calculating the daily consumption AAF estimates, and will be outlined below.

Alcohol-attributable fractions (AAFs)
The AAF for a given infectious or chronic disease can be expressed as follows [4]: where P abs is the proportion of lifetime abstainers, P form is the proportion of former drinkers among the population, and RR form is the relative risk of the latter proportion. P(x) represents the prevalence of drinking at level × (in grams per day, modelled by a gamma function), and RR(x) is the relative risk at this level compared to lifetime abstainers. In the CRAs, AAFs are usually calculated separately by sex, age, and sometimes by ethnic groups. In our study of Asian regions, AAFs were computed by region (see below), sex and age.
We did not use this mathematical expression in its original form when estimating the AAFs for several reasons. Firstly, a person whose daily consumption exceeds 150 grams per day is highly unlikely to consume this amount over a long period of time. Therefore, to be   conservative, the average daily consumption was truncated at 150 grams per day. Secondly, when there is truncation at 150 grams per day, the gamma distribution needs to be normalized by adding a coefficient in front of the probability density function to ensure that the area under this function will integrate to 1 between 0 and 150 grams of alcohol per day.
Step 2: Calculation of the variance of the AAF This step requires calculating the variance of the AAF estimates with risk data, and will be outlined below.
In order to derive 95% CIs for AAFs, two paths can be taken. The first one consists of deriving the expression for the variance of the AAF by taking into account all the errors of the parameters on which it depends, and subsequently computing the CI for the AAF. This approach, although mathematically accurate, is too complex in our case. Indeed, the AAF depends on the relative risk function, the prevalence of former drinkers and abstainers, and the distribution of consumption among drinkers. Since errors in these values and functions are non-trivial, it is virtually impossible to compute the variance of AAFs algebraically.
The second approach is simpler, but less accurate, and requires more computation. A number (we will call it N for simplicity) of random sets of the lowest level parameters (the parameters from which all other values are derived) are generated, namely the coefficients of the relative risk functions, the adult per capita consumption and the prevalence of former drinkers and lifetime abstainers. Each random set of lowest level parameters will then yield an AAF value for a total of N AAFs for each region, sex, age group and disease. The variance of the N AAFs will approach the true variance as N increases. This corresponds to calculating the variance of an AAF using a Monte Carlo-type method [18].
In order to generate these random samples for each lower level parameter, the distribution, mean and variance of each parameter must be known. The following paragraphs elucidate the methods used to determine the properties of each parameter.

Statistical procedures
The simulations were implemented in R (version: 2.10.1, refer to "Additional file 1: Example of R -code for simulations" for an example of the code) and the numerical errors inherent in any computational program were neglected (for example, the error (uncertainty) which is added by using numerical integration in calculating the AAFs was not taken into consideration for our variance calculations). The random normal generation of adult per capita consumption for the drinking population sometimes yields values that are negative or zero, which are factually impossible. In these instances, the value was set to 0.001 to symbolize very low consumption. Mathematically, a zero mean consumption would transform the gamma distribution into a Dirac distribution located at 0. In addition, drinkers would have a consumption of 0 grams per day which is not compatible with the definition of current drinkers.
The generation of adult per capita consumption assumes a random normal distribution as we have no information about an alternative distribution. Very low per capita consumption occasionally obtained by the random normal generation caused some additional trouble during the computation. The method used in R to numerically integrate a function results in errors and incorrect results if the corresponding function is either constant or approximately constant. When a gamma distribution has mean values that approach 0, it is spread very little (according to the linear relation between mean and standard deviation). This makes the distribution approximately constant after the initial spike close to the origin. These functions cannot be integrated and R produces an error message. As this problem occurs only when consumption levels are estimated to be very low, the assumption was made that under such circumstances the AAF calculated with this set of parameters would also be 0. This method assumes that former drinkers are not at an elevated risk for the given disease.
In general, the scale (θ) and shape () parameters of the gamma distributions are correlated and the covariance has to be taken into account when generating random samples of θ and . This difficulty is avoided considering the fact that is a constant in our case (it should be noted, however, that the constant is different for men and women). According to previous work by Rehm and colleagues: where β is the coefficient linking the standard deviation to the mean. Therefore, is independent of region, age group and θ. In order to generate a random sample of , the variance is found using the delta method: The generation of θ parameters is more difficult. Estimates of θ are different for each region, sex and age group since θ depends on the mean and variance of each gamma distribution. The generation of θ was performed in 2 steps: first, we generated a random sample of adult per capita consumption values from a normal distribution using the mean and standard deviation of this distribution, and second we generated a random sample of the prevalence of lifetime abstainers and former drinkers.
As the proportions of lifetime abstainers and former drinkers in each case follow binomial distributions, their variances, considering only sampling variation, can be expressed as follows: The effective sample size of each survey used to estimate the proportion of lifetime abstainers and former drinkers was assumed to be 1000 (to reflect an average sample size for surveys of 6000 per population, assuming that the three age-sex categories have equal cell size). Using these values, it is possible to calculate the corresponding proportion of drinkers, the mean consumption per sex-age category and, finally, θ, which is then simply given by θ = μ κ .
To account for the error of the final relative risk functions, N instances of each beta-coefficient were generated based on the covariance matrix. Each of these N relative risk functions obtained with one instance of each beta-coefficient was then assigned to one set of parameters defining the population (mean adult per capita consumption, proportion of abstainers and proportion of former drinkers). The relative risk functions were assumed to be the same for all regions and age groups.
As previously mentioned, each random set of lowest level parameters described above were then used to calculate an AAF value for a total of N AAFs for each region, sex, age group and disease. The variance of the N AAFs was used as the true variance of the AAF estimates.
Main analysis, sensitivity analyses, and evaluation of the impact of each variable on the variance As an example of this method, we calculated the AAFs for males aged 15 to 34 in the Asian regions; however, the above-described methods can also be used to calculate the AAFs for females. In addition, to demonstrate that partial AAFs and variances for these AAFs can be calculated for different consumption levels, we estimated the AAFs for cardiovascular diseases, ischemic stroke and diabetes for males aged 15 to 34 who are low consumers of alcohol (0 to 39.9 grams of alcohol per day), moderate consumers of alcohol (40 to 59.9 grams of alcohol per day) and heavy consumers of alcohol (60 to 150 grams of alcohol per day).
In order to accurately estimate the variance of an AAF we need to determine how many samples are required. Too few samples could lead to inaccurate results, while increasing the number of samples increases computing times and may require a larger amount of storage. Additionally, after a large number of iterations, the gain in accuracy is very small and does not provide new substantial information. Therefore, in order to determine the optimal number of random samples needed to calculate the variance of an AAF, a sensitivity analysis was performed. Since our samples are randomly generated, each set of samples is independent and allows us to collect a large amount of data relatively quickly. To decrease computation time, the code was adapted to generate 150 sets, each containing 1000 AAF estimates for each region (by sex and age and disease). The variance of each set of 1000 AAFs can then be averaged to estimate the variance of larger sets. By systematically increasing the number of sets used to calculate the average variance, we estimated the number of samples required for the variance to settle.
Next, we carried out an analysis to estimate the impact of each component on the final variance using the same sets of randomly generated variables, but in different arrangements. For the purposes of this analysis, only 1000 sets of lowest level parameters (see above for a definition) were generated.
To calculate the impact on the variance of each parameter, the AAFs were calculated for a set of parameters in which only the parameter tested was randomly generated while the other parameters were held constant. The variance obtained from the generated AAFs then represented the variance induced by the error of this single variable. Since the AAF function is non-linear, the variances obtained cannot simply be added together to obtain the total variance. To simplify the interpretation of the results, each contribution was normalized so that the sum equalled the total variance obtained as a result of the computation explained in the previous paragraphs. For the purpose of our analysis, the computations of the proportion of total variance explained by different variables were restricted to men in the five Asian regions defined above.
To compare in terms of dose response and magnitude the AAFs calculated using the new methodology by Rehm and colleagues to the method used in the 2004 CRA study [19], we calculated the partial AAFs for cardiovascular diseases and diabetes of multiple drinking categories for men in the five above defined Asian regions. The drinking categories were defined as 1) 0 to < 0.25 grams per day, 2) 0.25 to < 20 grams per day, 3) 20 to < 40 grams per day, and 4) 40+ grams per day. The relative risks used in the 2004 CRA study for cardiovascular diseases and diabetes were obtained from Gutjahr et al., [20], Reynolds et al., [21], Carrao et al., [22], and Corrao et al., [23].

Considerations of computing time
As R is a single-core program, splitting up the code into different parts (for example, by sex and age) allows a user to take advantage of the multi-core architecture of modern central processing units. Additionally, when dealing with large data sets, R slows down considerably. The splitting of the program into different sub-programs by age, sex and region allows a user to reduce the size of the data sets, and therefore to speed up the computations.

Results and Discussion
Per capita estimates by sex and region are shown in table 1, and prevalence of drinkers, former drinkers, and current drinkers are shown in table 2. We observed that on average men drink approximately 4.25 times more than women, and that regions with higher income levels and higher standards of living exhibited increased levels of alcohol consumption. The variance of per capita consumption is also much more important proportionally to the point estimate for countries with lower standards of living. The exhibited proportions of current drinkers bear out these conclusions. Table 3 depicts the point estimates of each disease for the male population of the five described Asian regions, including their 95% CIs. When an AAF estimate was close to zero, the CIs also crossed zero, making it impossible to determine if the AAF was truly positive or negative. The use of 150 000 random samples provided us with enough precision to confidently estimate the CIs to two decimal places. In addition, the partial AAFs and their variances can be calculated for low consumers of alcohol (0 to 39.9 grams of alcohol per day), moderate consumers of alcohol (40 to 59.9 grams of alcohol per day), and heavy consumers of alcohol (60 to 150 grams of alcohol per day). Table 4 outlines the AAFs for cardiovascular diseases, ischemic stroke and diabetes by consumption amount.

Impact on total variance of each parameter
As shown in Figure 4, the variance of the relative risk functions was, on average, the largest contributor to the variance of the AAFs. This is the case for disease categories such as ischemic heart disease and lower respiratory infections, where the variance of the betas of the relative risk function for these diseases is large [5]. For oral cavity cancer, oesophageal cancer, larynx cancer, pancreatitis, tuberculosis, liver cirrhosis and hemorrhagic stroke, adult per capita consumption was the largest contributor to the variance observed in the AAFs. We can speculate that either more research has been undertaken concerning those diseases leading to more precise risk functions, or that simple relationships can be more accurately estimated with fewer errors. When looking at the convergence of the variance, in the case of oral cavity and pharynx cancer, approximately 60 000 to 70 000 random samples are required to accurately estimate the variance of the AAFs. For the case of tuberculosis in Asia Central, we found that the variance converged only after 100 000 random samples were used. Therefore, in order to insure the convergence of each variance estimate considering a small safety margin, around 150 000 sample points are needed. It should be noted, however, that if the CIs are to be determined with a maximum error of ± 1 on the second decimal, the precision of the variance only needs to be greater than 2.6e-5 which is usually achieved after as few as 40 000 samples. Table 5 outlines the new methodology of calculating the AAF by Rehm and colleagues [4] and the methodology used in the 2004 CRA study. The AAFs for the new methodology are very close to the AAFs estimated using the old methodology for the same drinking groups; however, the inclusion of former drinkers in the new methodology increases the total AAF for each region. Also, it should be stated that prevalence of consumption in this sensitivity analysis was based on the gamma distribution for both methodologies (once used continuous and once categorical), whereas the original 2004 CRA study used a different form of up estimation [13].

Conclusions
In this paper we have presented a method to estimate uncertainty around AAFs and illustrated our results using data for men aged 15 to 34 years in several Asian regions. The use of 60 000 to 70 000 Monte Carlo     [24,25], it will not be enough to just conduct more epidemiological studies into the impact of average volume of alcohol consumption on the incidence of diseases (for an overview see [5]). Instead, other relevant dimensions of alcohol consumption, which could play a role in confounding the average volume of alcohol consumption, should be included in the design of cohort studies, and then should be statistically controlled for by using, for example, meta-regression techniques [26]. One limitation of our approach was the use of adjusted relative risks in determining AAFs. The relative risk formulas we used were developed for risks only adjusted for age (see [8,9,27]). Two arguments can be made to justify the use of these formulas. Firstly, in risk analyses, such as the CRA for the GBD Studies [28], almost all of the underlying studies for the different risk factors report only adjusted risks. Relying on unadjusted risks would severely bias the estimated risk functions as only a small proportion of generally older studies could be included. Secondly, for alcohol in particular, most of the analyses show no marked differences after adjustment for the usual risk factors tested (see [5], and the meta-analyses cited there). The need for adjustment to the relative risks may change when other dimensions of alcohol consumption, such as irregular heavy drinking occasions, are considered (see above).
Another limitation of the new methodology is the nature of the relative risks that are used in the CRA study. As there is likely to be undercoverage of alcohol consumption in the medical epidemiological studies upon which the relative risks are based, modelling 100% of adult per capita consumption will lead to biased results. Accordingly, as coverage of alcohol consumption in these studies is likely greater than 70% [10], we modelled alcohol consumption as 80% of adult per capita consumption. This adjustment leads to lower estimates of alcohol-attributable health harms [10]. Additionally, we modelled average daily alcohol consumption from 0 to 150 grams a day, using 150 grams as a maximum level. In very rare cases people may drink more than 150 grams per day; however, it is unlikely that this level of consumption would be maintained over an extended period of time [29]. An upper limit of alcohol consumption in grams per day may lead to an underestimation of the effects of    alcohol in terms of total harms, especially where alcohol at low doses has a positive effect and at high doses has a negative effect, such as with cardiovascular diseases, ischemic stroke and diabetes. Such instances are limited, however, as the risk ratios used to model the effects of alcohol were fractional polynomials allowing us to accurately characterize curvilinear risk relationships. Additionally, alcohol starts to have a negative effect well below a consumption level of 150 grams per day and, thus, limiting our consumption models to 150 grams per day does not have a substantial effect on the AAFs. Furthermore, as the upper limit of sustainable alcohol consumption probably differs depending on the sex of the drinker, more research is needed to define these limits.
Our new methodology is capable of being adjusted to take into account different parameters of alcohol consumption [10]. For example, this method can easily be modified for future research that focuses on the effects of specific alcohol consumption patterns on the burden of disease. In summary, future iterations of the CRA, or similar studies, should include CIs, as our methodology offers a feasible way to estimate the uncertainty of attributable fractions for all burdens of disease.

Additional material
Additional file 1: Example of R -code for simulations.