A cautionary note regarding count models of alcohol consumption in randomized controlled trials
© Horton et al. 2007
Received: 24 July 2006
Accepted: 15 February 2007
Published: 15 February 2007
Skip to main content
© Horton et al. 2007
Received: 24 July 2006
Accepted: 15 February 2007
Published: 15 February 2007
Alcohol consumption is commonly used as a primary outcome in randomized alcohol treatment studies. The distribution of alcohol consumption is highly skewed, particularly in subjects with alcohol dependence.
In this paper, we will consider the use of count models for outcomes in a randomized clinical trial setting. These include the Poisson, over-dispersed Poisson, negative binomial, zero-inflated Poisson and zero-inflated negative binomial. We compare the Type-I error rate of these methods in a series of simulation studies of a randomized clinical trial, and apply the methods to the ASAP (Addressing the Spectrum of Alcohol Problems) trial.
Standard Poisson models provide a poor fit for alcohol consumption data from our motivating example, and did not preserve Type-I error rates for the randomized group comparison when the true distribution was over-dispersed Poisson. For the ASAP trial, where the distribution of alcohol consumption featured extensive over-dispersion, there was little indication of significant randomization group differences, except when the standard Poisson model was fit.
As with any analysis, it is important to choose appropriate statistical models. In simulation studies and in the motivating example, the standard Poisson was not robust when fit to over-dispersed count data, and did not maintain the appropriate Type-I error rate. To appropriately model alcohol consumption, more flexible count models should be routinely employed.
Count outcomes are common in randomized studies of alcohol treatment. Subjects may be queried about their daily consumption of alcohol, measured as a number of drinks over a recent period  (typically 30 days), and these values are used to estimate average drinking per day. In this setting, estimating differences between treatment group and control group is of primary interest.
A challenge in modeling consumption outcomes is to appropriately account for the distribution of drinking. These distributions are characterized by a large number of zeros (abstinent subjects) along with a long right tail (heavy drinking subjects). An extensive literature describes models for counts [2–8], and they have been commonly applied in economic analyses, traffic accidents, and health services utilization. Many routines are now available in general purpose statistical software (e.g. Stata) . A natural model for counts is the single-parameter Poisson distribution. One disadvantage of the Poisson is that it makes strong assumptions regarding the distribution of the underlying data (in particular, that the mean equals the variance). While these assumptions are tenable in some settings, they are less appropriate for alcohol consumption. Extensions of the Poisson, such as the over-dispersed Poisson, negative binomial and two stage (hurdle) or zero inflated models have been proposed [2–5].
Our methods are motivated by the analysis of the ASAP (Addressing the Spectrum of Alcohol Problems) study, a randomized clinical trial comparing a brief motivational interview to usual care for a sample of inpatients with unhealthy alcohol use at an urban hospital . These subjects were followed to see if there were differences in drinking outcomes that could be attributed to randomized group assignment.
In this paper, we will demonstrate the limitations of the standard Poisson model in the presence of over-dispersion. We begin by describing several count models for alcohol outcomes, compare their performance in a series of simulated randomized trials, apply them to the ASAP study, and conclude with some general recommendations.
We begin by introducing notation to be used throughout. Let Y ij denote the number of events for the jth subject (j = 1,..., n i ) in the ith group (i = 1, 2), where n i is the number of subjects in the ith group. Typically in a randomized trial n 1 and n 2 are approximately equal.
for k = 0, 1, 2, ..., i = 1, 2, and j = 1,..., n i where λ ij > 0 and we assume that λ ij = λ i for all j (i.e. all subjects in a given group have the same rate of drinking). The λ parameter uniquely specifies this distribution, and is equal to the expected value (mean) and variance (i.e. E[Y ij ] = Var(Y ij ) = λ ij for all i and j). The maximum likelihood estimate (MLE) of is given by . In this setting, the test of randomized group effects for the Poisson model is a test of the null hypothesis that λ 1 = λ 2.
One limitation of this model is that it may be overly simplistic and may not provide an adequate fit to consumption data of the type that we consider. The constraint that the variance is equal to the mean may lead to incorrect test results.
Consider as an example the data from the ASAP study control group at 3 months. For this dataset, non-integer count values are possible. These arise when subjects consume a number of drinks not divisible by 30 (in the case of 30-day assessments). One approach in this situation would be to model the number of drinks consumed in a 30 day period, or utilize the non-integer values. Sometimes even the 30 day value is non-integer because people report a drink size that is then translated into standard drinks. The maximum likelihood estimates of the probability distributions remains the same for non-integer values, though it is necessary to move each non-integer observed value to the next integer (using a ceiling function) to be plotted. For the models that we discuss, we can plug non-integer values into the software and still get sensible results.
One approach to loosen the restrictive variance assumption involves use of an empirical (or robust or sandwich) variance estimator [11–13] to account for the over-dispersion. This more flexible extension of the Poisson allows the variance to be unconstrained. The over-dispersed Poisson option is available in a number of general purpose statistics packages (e.g. the robust option in Stata).
where Γ(·) denotes the Gamma function, λ i > 0 and θ i > 0. We note that E[Y ij ] = λ i and * θ i = λ i * (1 + λ i * θ i ) for all i and j and that Var(Y ij ) > E[Y ij ]. It can be shown that the negative binomial can be derived in terms of a Poisson random variable where the parameter λ i varies according to a gamma distribution.
The negative binomial model is attractive because it allows the relaxation of strong assumptions regarding the relationship between the mean and the variance. This flexibility comes at some cost, since a two-parameter model is inherently more complicated to interpret.
Other models have been proposed that allow for an extra abundance of subjects with no consumption. In alcohol consumption outcomes, there may be subjects who are "non-susceptible" (e.g. abstinent). These "zero-inflation" (or "hurdle") models account for subjects who are structural zeros (e.g., abstinent subjects thought of as "non-susceptible") [2, 3]. Conditional on being susceptible (with some probability), the distribution is assumed to be Poisson or negative binomial.
for 0 <p i < 1 and λ i > 0 where I(k = 0) is equal to 1 when k = 0, and equal to 0 otherwise. By distinguishing Always-0 (with probability p i ) and Not Always-0 group (with probability (1 - p i ) * exp(-λ i )) for abstainers and drinkers who didn't drink during the reporting period, respectively, it can incorporate an overabundance of zeros . Conditional on being a Not Always-0, counts are given by the Poisson distribution. This approach has been generalized to a regression framework, and implemented in general purpose statistical software (e.g. zip in Stata).
In many settings, the assumption that after accounting for the zeros the remaining counts are Poisson may not be tenable. The zero-inflated negative binomial (ZINB) allows for over-dispersion in this manner, though at the cost of more parameters.
Another approach to the modeling of count data involves use of a linear model (assuming that the observations are approximately Gaussian). While this is an extremely flexible model that is typically robust to misspecification (since the mean and variance are not linked), the linear model is less attractive because it may predict negative values of drinking given the skewness of the distribution. Use of a linear model is also inefficient if the variance is a function of the mean.
To better understand the behavior of these methods in a known situation, we conducted a series of simulation studies with parameters derived from the motivating example. These simulation studies were designed to address the question of whether or not the models were robust to misspecification of the underlying count distribution. More formally, we wanted to assess whether these models preserved the appropriate Type-I error rate (the probability of rejecting the null hypothesis when it is true) when there are no true differences between groups (i.e. do they reject the null at the appropriate α level).
The ASAP study was a randomized clinical trial of the effectiveness of a brief motivational intervention  on alcohol consumption among a group of hospitalized patients at Boston Medical Center. Details of the recruitment procedures, inclusion criteria, description of sample and results of the RCT have been published . The Institutional Review Board of Boston University Medical Center approved this study, and the Institutional Review Board of Smith College approved the secondary analyses. After consenting to enroll, all subjects received an interviewer-administered baseline assessment prior to randomization into the control or intervention group. Subjects were randomly assigned to control or intervention group using a blocked randomization procedure. Intervention subjects participated in a brief motivational interview with a counselor (less than half an hour). Control subjects received usual care.
Follow-up was planned at 3-month and 12-month timepoints. Because the subjects came from a transient and hard-to-reach population, the researchers employed exhaustive techniques to track subjects over the follow-up period. The two primary alcohol-related outcomes were measures of alcohol consumption and linkage to appropriate alcohol treatment; for these secondary analyses we focus solely on treatment differences in alcohol consumption. The outcome of interest was the average number of standard drinks consumed per day in the past thirty days as reported using the Timeline Followback method  at the 3 and 12-month interviews. For the purpose of this secondary analysis we consider the 3 month time point; similar results were seen utilizing 12 month data (not reported here).
Eight models were fit comparing treatment to control for the ASAP study:
Poisson standard Poisson model,
Over-dispersed Poisson Poisson model with empirical ("robust") variance estimator,
NB negative binomial,
ZIP zero-inflated Poisson, shared inflation parameter estimated for both randomized groups (p1 = p2),
ZINB zero-inflated negative binomial, shared inflation parameter estimated for both randomized groups (p1 = p2),
TTEST two-sample unequal variance t-test,
WILCOXON Wilcoxon-Mann-Whitney, a non-parametric two-sample comparison procedure suitable for ordinal data, and
PERMUTE two-sample permutation test.
In the simulation studies we assessed the behavior of models when the null hypothesis was true (there were no differences between alcohol consumption for groups 1 and 2). We note that the ZIP model failed to converge for more than a quarter of the simulations from the standard Poisson distribution. This is likely due to the fact that many datasets had no zeros whatsoever (for the Poisson distribution with λ = 5, the probability that a dataset has no zeros whatsoever is equal to (1 - exp(-5))100 = 0.51).
Estimated probability (and 99% CI) of rejecting the null hypothesis when there is no true difference between groups for a variety of statistical models and underlying distributions (results that do not include the alpha level of 0.05 are bolded)
Analysis model fit
Poisson (Var = 5)
NB (Var = 13)
NB (Var = 40)
NB (Var = 70)
ZIP (Var = 8)
Of 341 subjects enrolled in the clinical trial, 169 subjects were randomized to the control group and the other 172 into the intervention group. The mean age of the subjects was 44.3 (SD = 10.7). Twenty-nine percent were women, 45% were Black, 39% White, 9% Hispanic, and 7% Other. Sixty-three percent were unemployed during the past three months and 25% of the subjects were homeless at one point during the past three months. Four percent of the subjects met criteria for current (past year) alcohol abuse and 77% were alcohol dependent.
Distribution of drinking outcome by timepoint and randomization group
C (n = 169)
T (n = 72)
C (n = 141)
T (n = 130)
p-values for the ASAP randomization group effect at 3 months for a variety of count models
zero-inflated negative binomial
In this setting, there was little indication from the observed plots that there were significant group differences. As seen in the simulation studies, the Poisson may not have preserved the appropriate Type I error rate due to the extremely large values of drinking for some subjects. The Appendix includes the Stata commands to fit these models and the output, along with the code to generate observed and predicted plots using the prcounts routine.
A number of models have been proposed for the analysis of count data, and these models are now available in general purpose statistical packages. We have described these methods in the context of modeling reports of alcohol consumption, where a large proportion of respondents report no drinking, and a small number of respondents typically account for an extreme amount of drinking.
For the analysis of the ASAP study, we found that the standard Poisson had an extremely poor fit, and yielded a statistically significant p-value (in contrast to all of the other models, which had highly non-significant results). The unrealistic assumption that the expected rate of drinking is the same for all subjects may partially account for the poor fit of the Poisson distribution. We caution against use of the Poisson for this analysis. The negative binomial fit particularly well, and we saw no evidence for zero-inflation.
In settings where there are excess zeros, zero-inflation models are attractive. One advantage of these models is that they can estimate the probability of being a zero as a function of covariates, as well as allowing the rate parameter to be a function of covariates. In an alcohol study, the intervention may be hypothesized to affect the abstinence proportion as well as the rate parameter for drinkers. Ad-hoc methods in this setting might involve estimating the proportion of drinkers at follow-up, and in a separate model, estimating the amount of drinking amongst the subset of subjects who reported any drinking. A more principled approach involves the simultaneous estimation of the zero-inflation factor (testing p 1 = p 2) and the rate parameter (testing λ 1 = λ 2). Slymen and colleagues  adopted this approach by simultaneously fitting separate models for what they describe as the "logistic" component and the "Poisson" component, and this approach is also detailed in books by Winkelmann  as well as Cameron and Trivedi .
The results of the simulation studies and the secondary analyses of the ASAP study demonstrated the importance of appropriately modeling count outcomes. We caution against the use of the standard Poisson model when the mean and variance are not equal. Extensions of the Poisson (incorporating an over-dispersion parameter or use of the negative binomial distribution and/or zero-inflated models) are now available in general purpose statistical software, and address many of the shortcomings of the overly simplistic Poisson model.
As always, analysts are obliged to look at their data and utilize models that provide an appropriate fit in their situation. In particular, for models of alcohol consumption, attention should be paid to the functional form of the outcome to ensure that underlying assumptions of the methods utilized are met.
This research was supported in part by the National Institute on Alcohol Abuse and Alcoholism R01-AA12617, the Smith College Summer Research Program and the Howard Hughes Medical Institute. Thanks to Jessica Richardson for editorial assistance, Emily Shapiro and Min Zheng for assistance with simulations and Joseph Hilbe and Jeffrey Samet for helpful comments on an earlier draft.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.