This method has two main steps: (1) calculation of the AAFs, and (2) calculation of the variance for the AAFs. Information from multiple sources, all of which carries a certain degree of uncertainty, is required in order to calculate the AAFs. This information is outlined below.
Definition of regions
Regions were defined in accordance with the 2005 GBD study [11]. Countries were grouped into regions which were defined by their geographical location and epidemiological profile which includes child and adult mortality levels and major causes of death. Neither income nor population of the countries in a region had an impact on the grouping. For the purpose of illustrating the method, we restricted the analysis to the five Asian regions containing the countries listed below:
-
Asia Pacific, High Income: Brunei Darussalam, Japan, Republic of Korea, Singapore
-
Asia Central: Armenia, Azerbaijan, Georgia, Kazakhstan, Kyrgyzstan, Mongolia, Tajikistan, Turkmenistan, Uzbekistan
-
Asia East: China, Democratic People's Republic of Korea, Hong Kong, Macao, Taiwan
-
Asia South: Afghanistan, Bangladesh, Bhutan, India, Nepal, Pakistan
-
Asia Southeast: Cambodia, Christmas Island, Cocos Island, Indonesia, Lao People's Democratic Republic, Malaysia, Maldives, Mauritius, Myanmar, Philippines, Reunion, Seychelles, Sri Lanka, Thailand, Timor Leste, Vietnam
Population estimates for each region by country for 2005 were based on estimates obtained from the 2008 revisions of the United Nations Population Division [12].
Definition of age categories
Three age categories were used in the CRA study: 15 - 34, 35 - 64 and 65 or greater; we limited our study to the age category of 15 to 34 years. Ages were clustered so that results would be comparable with the 2005 GBD study.
Measures of alcohol consumption
Adult per capita consumption is calculated by adding together the estimated recorded and unrecorded alcohol consumption [13, 14]. The variance of an estimate of recorded consumption was based on estimates from different sources (for example, government data, industry data, Food and Agriculture Organization), which are usually quite similar. The main sources for determining unrecorded consumption are home production, alcohol intended for industrial, technical, and medical uses, and illegal production or importation of alcohol. The variance of an estimate of unrecorded consumption is larger in comparison to that of recorded consumption and there are usually only sparse sources for information on unrecorded consumption which is often based on limited empirical evidence [14, 15]. Since uncertainty of unrecorded adult per capita consumption is not provided in the 2005 CRA study, we assumed the standard deviation of unrecorded adult per capita consumption was proportionally five times larger than the standard deviation for recorded adult per capita consumption. The prevalence of lifetime abstainers and former drinkers was estimated from a population-weighted average of surveys in the respective regions by sex and age. Using the proportion of current drinkers we calculated the per capita consumption of alcohol per current drinker, which was used in modelling alcohol consumption. The variance of prevalence can be estimated using a binomial distribution, as illustrated below in the Statistical procedures section.
Modelling alcohol consumption
Using comparable studies, involving 1001 distributions from 66 countries by sex and age, it can be shown that the distribution of alcohol consumption for the drinking population is modelled best using the gamma distribution [4]. It is well known that population surveys underestimate true consumption, and thus data from surveys have to be triangulated with estimates of adult per capita consumption, which are often based on sales data [4, 13]. To be conservative, we assumed that 80% of this registry-based estimate reflected the true adult per capita consumption; this level was chosen to account for the alcohol wasted and not consumed (for example, broken bottles and quantities left over in glasses) and for the underestimation of true consumption in medical epidemiological studies, which were used in the meta-analyses that estimated the relative risk functions. A regression of the above-mentioned studies showed a strong relationship between mean and standard deviation (for men and women, the explained variance of standard deviation was greater than 90%). This relationship allows us to compute the standard deviation of an upshifted distribution very easily. Finally, this method relies on the assumption that the proportion of alcohol consumed by the various sex and age groups derived from surveys is accurate [4].
Measures of relative risk
The relative risk functions for each chronic disease were derived using a series of meta-analyses which used fractional polynomial regression [5] separated by sex and, where possible, by morbidity and mortality (e.g. liver cirrhosis [16] or stroke [17]). The coefficients of each polynomial representing the relative risk function are called beta-coefficients. The uncertainty of the relative risk beta-coefficients is expressed by a covariance matrix (obtained from the meta-analyses). Of the diseases with which alcohol is associated, these associations take one of three forms: 1) exponential, 2) linear, or 3) J-shaped. Figure 1 provides a plot of the relative risk for liver cirrhosis for men as an example of a disease having an exponential relationship with alcohol consumption, figure 2 is a plot of the relative risk for hemorrhagic stroke (mortality) for men which has a linear association with alcohol consumption, and figure 3 outlines the relative risk for coronary heart disease for men as an example of a disease having a j-shaped relationship with alcohol consumption.
Step 1: Calculation of the AAF
This step requires calculating the daily consumption AAF estimates, and will be outlined below.
Alcohol-attributable fractions (AAFs)
The AAF for a given infectious or chronic disease can be expressed as follows [4]:
where Pabs is the proportion of lifetime abstainers, Pform is the proportion of former drinkers among the population, and RRform is the relative risk of the latter proportion. P(x) represents the prevalence of drinking at level × (in grams per day, modelled by a gamma function), and RR(x) is the relative risk at this level compared to lifetime abstainers. In the CRAs, AAFs are usually calculated separately by sex, age, and sometimes by ethnic groups. In our study of Asian regions, AAFs were computed by region (see below), sex and age.
We did not use this mathematical expression in its original form when estimating the AAFs for several reasons. Firstly, a person whose daily consumption exceeds 150 grams per day is highly unlikely to consume this amount over a long period of time. Therefore, to be conservative, the average daily consumption was truncated at 150 grams per day. Secondly, when there is truncation at 150 grams per day, the gamma distribution needs to be normalized by adding a coefficient in front of the probability density function to ensure that the area under this function will integrate to 1 between 0 and 150 grams of alcohol per day.
Step 2: Calculation of the variance of the AAF
This step requires calculating the variance of the AAF estimates with risk data, and will be outlined below.
In order to derive 95% CIs for AAFs, two paths can be taken. The first one consists of deriving the expression for the variance of the AAF by taking into account all the errors of the parameters on which it depends, and subsequently computing the CI for the AAF. This approach, although mathematically accurate, is too complex in our case. Indeed, the AAF depends on the relative risk function, the prevalence of former drinkers and abstainers, and the distribution of consumption among drinkers. Since errors in these values and functions are non-trivial, it is virtually impossible to compute the variance of AAFs algebraically.
The second approach is simpler, but less accurate, and requires more computation. A number (we will call it N for simplicity) of random sets of the lowest level parameters (the parameters from which all other values are derived) are generated, namely the coefficients of the relative risk functions, the adult per capita consumption and the prevalence of former drinkers and lifetime abstainers. Each random set of lowest level parameters will then yield an AAF value for a total of N AAFs for each region, sex, age group and disease. The variance of the N AAFs will approach the true variance as N increases. This corresponds to calculating the variance of an AAF using a Monte Carlo-type method [18].
In order to generate these random samples for each lower level parameter, the distribution, mean and variance of each parameter must be known. The following paragraphs elucidate the methods used to determine the properties of each parameter.
Statistical procedures
The simulations were implemented in R (version: 2.10.1, refer to "Additional file 1: Example of R - code for simulations" for an example of the code) and the numerical errors inherent in any computational program were neglected (for example, the error (uncertainty) which is added by using numerical integration in calculating the AAFs was not taken into consideration for our variance calculations). The random normal generation of adult per capita consumption for the drinking population sometimes yields values that are negative or zero, which are factually impossible. In these instances, the value was set to 0.001 to symbolize very low consumption. Mathematically, a zero mean consumption would transform the gamma distribution into a Dirac distribution located at 0. In addition, drinkers would have a consumption of 0 grams per day which is not compatible with the definition of current drinkers.
The generation of adult per capita consumption assumes a random normal distribution as we have no information about an alternative distribution. Very low per capita consumption occasionally obtained by the random normal generation caused some additional trouble during the computation. The method used in R to numerically integrate a function results in errors and incorrect results if the corresponding function is either constant or approximately constant. When a gamma distribution has mean values that approach 0, it is spread very little (according to the linear relation between mean and standard deviation). This makes the distribution approximately constant after the initial spike close to the origin. These functions cannot be integrated and R produces an error message. As this problem occurs only when consumption levels are estimated to be very low, the assumption was made that under such circumstances the AAF calculated with this set of parameters would also be 0. This method assumes that former drinkers are not at an elevated risk for the given disease.
In general, the scale (θ) and shape (κ) parameters of the gamma distributions are correlated and the covariance has to be taken into account when generating random samples of θ and κ. This difficulty is avoided considering the fact that κ is a constant in our case (it should be noted, however, that the constant is different for men and women). According to previous work by Rehm and colleagues:
where β is the coefficient linking the standard deviation to the mean. Therefore, κ is independent of region, age group and θ. In order to generate a random sample of κ, the variance is found using the delta method:
The generation of θ parameters is more difficult. Estimates of θ are different for each region, sex and age group since θ depends on the mean and variance of each gamma distribution. The generation of θ was performed in 2 steps: first, we generated a random sample of adult per capita consumption values from a normal distribution using the mean and standard deviation of this distribution, and second we generated a random sample of the prevalence of lifetime abstainers and former drinkers.
As the proportions of lifetime abstainers and former drinkers in each case follow binomial distributions, their variances, considering only sampling variation, can be expressed as follows:
The effective sample size of each survey used to estimate the proportion of lifetime abstainers and former drinkers was assumed to be 1000 (to reflect an average sample size for surveys of 6000 per population, assuming that the three age-sex categories have equal cell size). Using these values, it is possible to calculate the corresponding proportion of drinkers, the mean consumption per sex-age category and, finally, θ, which is then simply given by
.
To account for the error of the final relative risk functions, N instances of each beta-coefficient were generated based on the covariance matrix. Each of these N relative risk functions obtained with one instance of each beta-coefficient was then assigned to one set of parameters defining the population (mean adult per capita consumption, proportion of abstainers and proportion of former drinkers). The relative risk functions were assumed to be the same for all regions and age groups.
As previously mentioned, each random set of lowest level parameters described above were then used to calculate an AAF value for a total of N AAFs for each region, sex, age group and disease. The variance of the N AAFs was used as the true variance of the AAF estimates.
Main analysis, sensitivity analyses, and evaluation of the impact of each variable on the variance
As an example of this method, we calculated the AAFs for males aged 15 to 34 in the Asian regions; however, the above-described methods can also be used to calculate the AAFs for females. In addition, to demonstrate that partial AAFs and variances for these AAFs can be calculated for different consumption levels, we estimated the AAFs for cardiovascular diseases, ischemic stroke and diabetes for males aged 15 to 34 who are low consumers of alcohol (0 to 39.9 grams of alcohol per day), moderate consumers of alcohol (40 to 59.9 grams of alcohol per day) and heavy consumers of alcohol (60 to 150 grams of alcohol per day).
In order to accurately estimate the variance of an AAF we need to determine how many samples are required. Too few samples could lead to inaccurate results, while increasing the number of samples increases computing times and may require a larger amount of storage. Additionally, after a large number of iterations, the gain in accuracy is very small and does not provide new substantial information. Therefore, in order to determine the optimal number of random samples needed to calculate the variance of an AAF, a sensitivity analysis was performed. Since our samples are randomly generated, each set of samples is independent and allows us to collect a large amount of data relatively quickly. To decrease computation time, the code was adapted to generate 150 sets, each containing 1000 AAF estimates for each region (by sex and age and disease). The variance of each set of 1000 AAFs can then be averaged to estimate the variance of larger sets. By systematically increasing the number of sets used to calculate the average variance, we estimated the number of samples required for the variance to settle.
Next, we carried out an analysis to estimate the impact of each component on the final variance using the same sets of randomly generated variables, but in different arrangements. For the purposes of this analysis, only 1000 sets of lowest level parameters (see above for a definition) were generated.
To calculate the impact on the variance of each parameter, the AAFs were calculated for a set of parameters in which only the parameter tested was randomly generated while the other parameters were held constant. The variance obtained from the generated AAFs then represented the variance induced by the error of this single variable. Since the AAF function is non-linear, the variances obtained cannot simply be added together to obtain the total variance. To simplify the interpretation of the results, each contribution was normalized so that the sum equalled the total variance obtained as a result of the computation explained in the previous paragraphs. For the purpose of our analysis, the computations of the proportion of total variance explained by different variables were restricted to men in the five Asian regions defined above.
To compare in terms of dose response and magnitude the AAFs calculated using the new methodology by Rehm and colleagues to the method used in the 2004 CRA study [19], we calculated the partial AAFs for cardiovascular diseases and diabetes of multiple drinking categories for men in the five above defined Asian regions. The drinking categories were defined as
1) 0 to < 0.25 grams per day, 2) 0.25 to < 20 grams per day, 3) 20 to < 40 grams per day, and 4) 40+ grams per day. The relative risks used in the 2004 CRA study for cardiovascular diseases and diabetes were obtained from Gutjahr et al., [20], Reynolds et al., [21], Carrao et al., [22], and Corrao et al., [23].
Considerations of computing time
As R is a single-core program, splitting up the code into different parts (for example, by sex and age) allows a user to take advantage of the multi-core architecture of modern central processing units. Additionally, when dealing with large data sets, R slows down considerably. The splitting of the program into different sub-programs by age, sex and region allows a user to reduce the size of the data sets, and therefore to speed up the computations.