Statistical methods for the analysis of count outcomes
We begin by introducing notation to be used throughout. Let Y
ij
denote the number of events for the jth subject (j = 1,..., n
i
) in the ith group (i = 1, 2), where n
i
is the number of subjects in the ith group. Typically in a randomized trial n
1 and n
2 are approximately equal.
The Poisson distribution is one of the simplest models for count data. Let λ
ij
indicate the average number of events (in this case drinks consumed) in a given time interval for subject j in group i, where f(Y
ij
= k|λ
ij
) is the probability of observing k events. The Poisson distribution [8, 10] is denoted:
for k = 0, 1, 2, ..., i = 1, 2, and j = 1,..., n
i
where λ
ij
> 0 and we assume that λ
ij
= λ
i
for all j (i.e. all subjects in a given group have the same rate of drinking). The λ parameter uniquely specifies this distribution, and is equal to the expected value (mean) and variance (i.e. E[Y
ij
] = Var(Y
ij
) = λ
ij
for all i and j). The maximum likelihood estimate (MLE) of
i
is given by
i
. In this setting, the test of randomized group effects for the Poisson model is a test of the null hypothesis that λ
1 = λ
2.
One limitation of this model is that it may be overly simplistic and may not provide an adequate fit to consumption data of the type that we consider. The constraint that the variance is equal to the mean may lead to incorrect test results.
Consider as an example the data from the ASAP study control group at 3 months. For this dataset, non-integer count values are possible. These arise when subjects consume a number of drinks not divisible by 30 (in the case of 30-day assessments). One approach in this situation would be to model the number of drinks consumed in a 30 day period, or utilize the non-integer values. Sometimes even the 30 day value is non-integer because people report a drink size that is then translated into standard drinks. The maximum likelihood estimates of the probability distributions remains the same for non-integer values, though it is necessary to move each non-integer observed value to the next integer (using a ceiling function) to be plotted. For the models that we discuss, we can plug non-integer values into the software and still get sensible results.
Figure 1 displays the observed distribution and superimposed Poisson with
1 =
1 = 4.98 using the prcounts routine in Stata [8]. The axis for the number of drinks per day after 3 months was limited to 25 drinks to improve readability (the maximum observed count was 48.6). There is a pronounced lack of fit to this model, particularly for values of less than 10 drinks per day. For the ASAP data, the assumption that the mean is equal to the variance is not tenable. In fact, the observed variance (71.7) is more than an order of magnitude larger than the mean. Also, note that there is some evidence for digit preference (even numbers are more common than odd numbers).
One approach to loosen the restrictive variance assumption involves use of an empirical (or robust or sandwich) variance estimator [11–13] to account for the over-dispersion. This more flexible extension of the Poisson allows the variance to be unconstrained. The over-dispersed Poisson option is available in a number of general purpose statistics packages (e.g. the robust option in Stata).
Another approach is to fit a negative binomial (two parameter) count model (NB) [5–8, 10]. One common parametrization of the negative binomial distribution is given by:
where Γ(·) denotes the Gamma function, λ
i
> 0 and θ
i
> 0. We note that E[Y
ij
] = λ
i
and Var(Y
ij
) = λ
i
+ * θ
i
= λ
i
* (1 + λ
i
* θ
i
) for all i and j and that Var(Y
ij
) > E[Y
ij
]. It can be shown that the negative binomial can be derived in terms of a Poisson random variable where the parameter λ
i
varies according to a gamma distribution.
The negative binomial model is attractive because it allows the relaxation of strong assumptions regarding the relationship between the mean and the variance. This flexibility comes at some cost, since a two-parameter model is inherently more complicated to interpret.
Other models have been proposed that allow for an extra abundance of subjects with no consumption. In alcohol consumption outcomes, there may be subjects who are "non-susceptible" (e.g. abstinent). These "zero-inflation" (or "hurdle") models account for subjects who are structural zeros (e.g., abstinent subjects thought of as "non-susceptible") [2, 3]. Conditional on being susceptible (with some probability), the distribution is assumed to be Poisson or negative binomial.
Zero-inflated Poisson (ZIP) models [3] separately estimate a parameter p
i
that governs the proportion of non-susceptible subjects in the ith group:
for 0 <p
i
< 1 and λ
i
> 0 where I(k = 0) is equal to 1 when k = 0, and equal to 0 otherwise. By distinguishing Always-0 (with probability p
i
) and Not Always-0 group (with probability (1 - p
i
) * exp(-λ
i
)) for abstainers and drinkers who didn't drink during the reporting period, respectively, it can incorporate an overabundance of zeros [8]. Conditional on being a Not Always-0, counts are given by the Poisson distribution. This approach has been generalized to a regression framework, and implemented in general purpose statistical software (e.g. zip in Stata).
In many settings, the assumption that after accounting for the zeros the remaining counts are Poisson may not be tenable. The zero-inflated negative binomial (ZINB) allows for over-dispersion in this manner, though at the cost of more parameters.
Another approach to the modeling of count data involves use of a linear model (assuming that the observations are approximately Gaussian). While this is an extremely flexible model that is typically robust to misspecification (since the mean and variance are not linked), the linear model is less attractive because it may predict negative values of drinking given the skewness of the distribution. Use of a linear model is also inefficient if the variance is a function of the mean.
Simulation study
To better understand the behavior of these methods in a known situation, we conducted a series of simulation studies with parameters derived from the motivating example. These simulation studies were designed to address the question of whether or not the models were robust to misspecification of the underlying count distribution. More formally, we wanted to assess whether these models preserved the appropriate Type-I error rate (the probability of rejecting the null hypothesis when it is true) when there are no true differences between groups (i.e. do they reject the null at the appropriate α level).
For each set of parameters within a simulation, 100 observations were generated in each of two groups, to mimic a randomized clinical trial setting. The amount of alcohol consumption, in drinks per day was the outcome. For each simulated dataset a series of models (Poisson, negative binomial and zero-inflated Poisson) were fit. This process was repeated 2500 times for each set of parameters, where E[Y
i
] = λ = 5 (taken from the ASAP control group) and an α level of 0.05 was used. For the simulation of Poisson data the variance was equal to the mean. Negative binomial distributions were simulated using three arbitrary variances (13.3, 40 and 70), with the latter value comparable to the observed variance from the ASAP control group. The zero-inflated model had a probability of 0.2 of being a structural zero, and Poisson with λ = 5 otherwise. The true distributions for the simulations are displayed in Figure 2. Models were fit using the Poisson distribution, over-dispersed Poisson using an empirical variance estimator, negative binomial and zero inflated Poisson. We estimated the probability that each model rejected the null hypothesis and constructed a 99% confidence interval around this estimate. The code for the simulations is available upon request from the first author.
ASAP study
The ASAP study was a randomized clinical trial of the effectiveness of a brief motivational intervention [14] on alcohol consumption among a group of hospitalized patients at Boston Medical Center. Details of the recruitment procedures, inclusion criteria, description of sample and results of the RCT have been published [15]. The Institutional Review Board of Boston University Medical Center approved this study, and the Institutional Review Board of Smith College approved the secondary analyses. After consenting to enroll, all subjects received an interviewer-administered baseline assessment prior to randomization into the control or intervention group. Subjects were randomly assigned to control or intervention group using a blocked randomization procedure. Intervention subjects participated in a brief motivational interview with a counselor (less than half an hour). Control subjects received usual care.
Follow-up was planned at 3-month and 12-month timepoints. Because the subjects came from a transient and hard-to-reach population, the researchers employed exhaustive techniques to track subjects over the follow-up period. The two primary alcohol-related outcomes were measures of alcohol consumption and linkage to appropriate alcohol treatment; for these secondary analyses we focus solely on treatment differences in alcohol consumption. The outcome of interest was the average number of standard drinks consumed per day in the past thirty days as reported using the Timeline Followback method [1] at the 3 and 12-month interviews. For the purpose of this secondary analysis we consider the 3 month time point; similar results were seen utilizing 12 month data (not reported here).
Eight models were fit comparing treatment to control for the ASAP study:
Poisson standard Poisson model,
Over-dispersed Poisson Poisson model with empirical ("robust") variance estimator,
NB negative binomial,
ZIP zero-inflated Poisson, shared inflation parameter estimated for both randomized groups (p
1 = p
2),
ZINB zero-inflated negative binomial, shared inflation parameter estimated for both randomized groups (p
1 = p
2),
TTEST two-sample unequal variance t-test,
WILCOXON Wilcoxon-Mann-Whitney, a non-parametric two-sample comparison procedure suitable for ordinal data, and
PERMUTE two-sample permutation test.