Research article  Open  Open Peer Review  Published:
Tweedie distributions for fitting semicontinuous health care utilization cost data
BMC Medical Research Methodologyvolume 17, Article number: 171 (2017)
Abstract
Background
The statistical analysis of health care cost data is often problematic because these data are usually nonnegative, rightskewed and have excess zeros for nonusers. This prevents the use of linear models based on the Gaussian or Gamma distribution. A common way to counter this is the use of Twopart or Tobit models, which makes interpretation of the results more difficult. In this study, I explore a statistical distribution from the Tweedie family of distributions that can simultaneously model the probability of zero outcome, i.e. of being a nonuser of health care utilization and continuous costs for users.
Methods
I assess the usefulness of the Tweedie model in a Monte Carlo simulation study that addresses two common situations of low and high correlation of the users and the nonusers of health care utilization. Furthermore, I compare the Tweedie model with several other models using a real data set from the RAND health insurance experiment.
Results
I show that the Tweedie distribution fits cost data very well and provides better fit, especially when the number of nonusers is low and the correlation between users and nonusers is high.
Conclusion
The Tweedie distribution provides an interesting solution to many statistical problems in health economic analyses.
Background
In modelling cost data of health care utilization, the nonnegative response variable is often zero because of nonusers, while the positive realisations are usually usually rightskewed. Such variables are called semicontinuous [1] and pose a number of problems: because of the point mass at zero, common models involving the Gamma or lognormal distributions have difficulty with such a mixture of discrete and continuous values. A popular way to account for this in the generalized linear models (GLM) framework is the use of twopart models [2], which combine a binary model for the dichotomous event of having either zero or positive values with a continuous model for those having positive values. This complements a twostage decision process, which can be inadequate because the two decisions are not usually made independently (Winkelmann [3] and Van Ophem [4] discuss this for the case of physicians visits). Another more simple model, using a single distribution, is the Tobit model [5]. This model is based on a zerotruncated normal distribution but cannot handle excess zeros, i.e. the presence of more zeros in the data than would be expected from the underlying distribution. In this linear regression setting, constant variance is assumed, which is also inadequate for cost data. Sometimes, count data model like the Poisson are also used for cost modelling [6].
Recent research has mainly focused on developing new models and comparing distributions for the continuous part of the twopart models. For example, the generalized Gamma distribution (GenG) is a flexible choice as it has one scale and two shape parameters. The standard Gamma, Weibull, exponential, and the lognormal are all special cases of this distribution. Manning finds that this distribution provides a more robust alternative estimator than the standard alternatives [7]. Jones et al. compared several recent developments in parametric and semiparametric regression models for health care costs [8]. Other comparative studies, in which models are compared on either real data, i.e. the true distribution is unknown, or using simulations, include Basu et al. [9], Hill and Miller [10] and Jones et al. [11]. They all focus on the analysis of positive costs with no emphasis on the zero aspect. The only comparative study considering zero costs is Buntin and Zaslavsky [6].
In this study, I consider a single distribution GLM for cost data that can simultaneously model the zeros and continuous positive outcomes. The number of excess zeros can be arbitrarily high while still providing good support for the positive costs. Variance can be specified as some power of the mean. This model, based on the family of Tweedie densities [12], has already been shown to perform well in the case of rainfall precipitation [13] and insurance premiums [14]. To my knowledge, the Tweedie densities have not been used in health economic cost data modelling before. In the following, I compare the Tweedie model with the twopart (Binomial/Gamma and Binomial/GenG), the Tobit, and the Poisson models regarding marginal effects (at the means), model fit and prediction error in both Monte Carlo simulation and real data. As analysts favour simple models that are easy to interpret, I restrict myself to these alternatives. For an overview of other, more specialized methods, I refer to Mihaylova et al. [15] and the literature already mentioned.
The rest of this paper is structured as follows: “Methods” section illustrates the properties of the Tweedie family of distributions and explains the proposed model. Furthermore, it outlines the simulation study and describes the data. The “Results” section compares the Tweedie with the twopart, Tobit, and Poisson models on these data before the last section concludes.
Code and data to reproduce all analyses are available on the author’s github page (https://git.io/v6adW. Accessed 16 Aug 2016).
Methods
Tweedie family densities
I outline the model used in this paper as a special case of exponential dispersion models (EDMs) [12]. This class of models is a broad family of distributions defined by the form
where both the normalizing functions a(·) and κ(·) are known. θ is the natural parameter and ϕ>0 is called the dispersion parameter. Mean μ and variance of a random variable Y from an EDM are given by E(Y)=μ=κ ^{′}(θ) and Var(Y)=κ ^{″}(θ)ϕ respectively. The Tweedie family of distributions corresponds to special cases of EDMs where the power meanvariance relationship is characterized by Var(μ)=ϕ μ ^{p} for p∉(0,1). The Tweedie family includes a number of familiar distributions, e.g. Normal (p=0), Poisson (p=1), Gamma (p=2) and inverse Gaussian (p=3).
For cost data modelling, the choice p∈(1,2) is the most interesting one and the main focus here because of its support for semicontinuous outcomes. Tweedie distributions in this range of p belong to the socalled compound PoissonGamma distributions [12]. Let M∼Pois(λ) be a Poisson random variable and let X _{ i }∼i i dGamma(α,β) be Gamma distributed with M⊥X _{ i }, then a random variable Z, defined by
follows a compound PoissonGamma distribution, i.e. is a Poisson sum of Gamma random variables. If M=0, then Z=0, thus allowing for a probability mass at zero for nonusers, where Pr(Z=0)= exp(−λ). If M>0, then Z is the sum of M iid Gamma random variables, so conditional on M, ZM∼Gamma(M α,β), resulting in a continuous distribution for the positive outcome. With M=m, the distribution for z>0 is therefore given as:
These parameters λ,α and β are related to the Tweedie distribution parameters μ,ϕ and p by:
Recovering the underlying marginal distribution of Z results in a nonclosed form expression for the normalizing function a(·), based on Wright’s generalized Bessel function W(·,·,·) [13, 16]:
Dunn and Smyth [17] show that this function is strictly convex and can be approximated by Stirling’s formula for the Gamma function and a Fourier inversion method for the infinite series. In practice, first, the parameters ϕ and p are estimated by numerically maximizing the profile likelihood, i.e. profiling out the mean parameter μ as it is determined for a given value of ϕ. Second, the mean parameter is estimated using a GLM with the previously estimated ϕ. The Tweedie distribution cannot be expressed in closed form. To compute the profile likelihood, numerical optimization methods must be used [16].
Because Tweedie distributions also belong to the exponential family of distributions, they can be used in the GLM framework [18]. Besides the ability to model exact zeros and continuous outcomes, the idea that positive total costs are sums of smaller costs provides an intuitive appeal: Z is the total amount of expenses in a given period, M the number of utilization events, and X _{ i } the expenses of the ith event. In the following, I show that the Tweedie distributions fit health care utilization cost data very well.
Monte Carlo simulation
I used a Monte Carlo simulation to address two common situations when modelling semicontinuous cost data:

1.
The nonusers can be substantially different from the users, i.e. they imply different characteristics and have low correlation with the users.

2.
The nonusers belong to the same “distribution” as the users, i.e. they share the same personal attributes, and therefore show high correlation with the users.
In both situations, a broad range of circumstances that are common in health care cost data were examined. They are: (1) skewness and nonnormality of the costs; (2) range of the positive costs; and (3) outliers, i.e. proportion of individual high cost cases. The Gamma distribution provides a way to deal with these matters flexibly by specifying the shape and rate parameters. In the GLM, the outcome Y of the dependent variables is generated from a distribution in the exponential family. The mean μ of the distribution depends on the independent variables, X, through
where E(Y) is the expected value of Y, X β is the linear predictor and g is the link function. The probability density function of the Gamma distribution with shape parameter α>0 and rate parameter θ>0 is defined by
The expectation of the Gamma distribution is $E(X)=\frac {\alpha }{\theta }$ and the variance is $Var(X)=\frac {\alpha }{\theta ^{2}}$. Generating the outcome Y given a linear predictor X β and a rate parameter θ, using a loglink function g(z)= log(z), is therefore possible because
To generate the data for the Monte Carlo simulation, I built a data matrix X with three columns for covariates. I drew values in these columns at random from uniform distributions U(2,8), U(−10,1), and U(−2,0) respectively. Corresponding parameters β _{1},β _{2}, and β _{3} were drawn from U(−2,1). Together with a rate parameter of θ=30, these choices allow for a wide variation of possible shapes of the Gamma distribution, and consequently of the costs Y.
In addition, I evaluated the following different proportions of nonusers/zeros for both situations: 0.05, 0.1, 0.15, 0.2, 0.3, 0.5 and 0.7. This choice covers a broad range of settings that occur in real world scenarios. In the low correlation setting, I drew values for nonusers from uniform distributions U(3,6), U(−2,3), and U(−1,1) to account for slightly different personal characteristics. In the high correlation setting, I set users corresponding to the lowest percentiles to zero. This implies that users and nonusers share very similar characteristics but only differ in costs. For both low and high correlation and the varying amount of zeros, I generated 100 different data sets, each with N=5000 observations, which reflect a great diversity of possible scenarios. The choice of 5000 observations conforms to many realworld situations. Fewer observations often lead to numerical instabilities, especially in the presence of many zeros, while more observations usually do not improve the estimations.
As comparison metrics, I chose the Akaike information criterion (AIC) for model fit and the root mean square error (RSME) for predictive accuracy. RMSE is defined by
where $\hat y$ denotes the estimate and y the true value. Because of the squared term, larger errors have a disproportionately large effect on RMSE. This effect is desirable when predicting cost data with outliers. For an unbiased estimator, the RMSE is the standard deviation. The reported value is the average of a 5fold crossvalidation. Crossvalidation involves partitioning the data into five complementary subsets, performing the analysis on four subsets, and validating the analysis on the one remaining subset. This is repeated four times so that each subset is used once for validation, and the validation results are combined (e.g. averaged) over the rounds to estimate a final RMSE value. As both AIC and RMSE have no intrinsic interpretation for comparison across different data sets, I defined a summed rank value. In this ranking, the best model (lowest AIC or lowest RMSE) is assigned a value of 1, whereas the worst of the five compared models gets value 5. Ranks 2, 3, and 4 are assigned accordingly, resulting in a theoretical value of 100 for a model that wins across all 100 data sets.
Data
In a second evaluation, I used data from the RAND Health Insurance Experiment (RAND HIE). This US study measured health care costs, among other outcomes, of people randomly assigned to different kinds of plans. Because of the random assignment, the reliability of health insurance coverage and the availability of important variables for this application, these data provide an accurate base for cost modelling in this case.
As outcome variable I took the total costs, consisting of outpatient, drug, supply, psychotherapy and inpatient expenses. I selected covariates commonly considered to determine health care utilization. Among the socioeconomic characteristics were age, gender, race, the logarithm of family income (LINC), the number of physical limitations (PHYSLM), the number of chronic diseases (DISEA), the logarithm of family size (LFAM), the education of the household head in years (EDUCDEC) and a dummy variable indicating selfrated health as good (HLTHG). Insurancespecific variables included the log coinsurance rate plus 1 (LOGC), a dummy for the individual deductible plan (IDP), the log of the participation incentive payment (LPI), and a maximum expenditure function (FMDE). A more detailed description of the data set and the variables is available in Deb et al. [19].
I only chose the first year of observation for each individual 18 years and older (N=3301). There are 18.1% zero observations for the costs with a mean of 206.75 (standard deviation 597.98) and a maximum of 17730. The positive costs alone have a mean of 252.49 and a standard deviation of 652.05, indicating the skewness.
For model comparison on these data, I again computed AIC and RMSE (in 5fold crossvalidation). In addition, I looked at the marginal effects. The marginal effect of an independent variable is the derivative of a given function of the covariates and coefficients of the preceding estimation. The derivative is evaluated at the means of the covariates.
Results
Simulation study
In this section, I apply the twopart, the Tweedie, the Tobit, and the Poisson models to the simulated data and the RAND HIE data. The twopart model involves two estimations: first, it decides whether someone has zero or nonzero costs using a logistic regression. Second, conditional on having nonzero costs, it applies a continuous distribution to the positive outcome. I used both Gamma with loglink and generalized Gamma with loglink for this part. I also used the loglink for the Tweedie and the Poisson model. The Tobit model features an underlying normal distribution truncated at zero. For a more detailed description and justification of the twopart, Tobit, and Poisson models, see [1] and [15]. The aim of this application is to show how model choice affects model fit and prediction in the case of semicontinuous health care cost data. In the RAND HIE case, I also look at the marginal effects, but I reveal no causal effects in this study. Figure 1 presents the Monte Carlo simulation results of the rank comparison of both RMSE and AIC across the different settings with low and high correlation and varying numbers of zeros. If the number of zeros was below 20%, the Tweedie model outperformed the Tobit, the Poisson, and both twopart models in situations with high correlation between users and nonusers. When the zero percentage was above 0.2, twopart models started to surpass the Tweedie model in both AIC and RMSE. The difference in performance regarding RMSE was less apparent between twopart and Tweedie models in the low correlation setting. However, twopart models had better model fit in these situations. The GenG was always better than the Gamma, whereas the Tobit performance was generally very bad. The Poisson model had very bad model fit but good predictive performance, especially in the low correlation setting.
RAND HIE data
Table 1 presents the marginal effects estimation results for the models discussed using the RAND HIE data. Although the estimates of the Tobit model are quite different in the value range and sign, the Tweedie and both Gamma and GenG parts of the twopart model are more similar: all estimates (except one for the GenG) shared the same sign and had comparable values, leading to similar conclusions. The Poisson model is more similar to the Tobit. Looking at the standard errors, both Tweedie and Gamma estimations lead to higher estimated standard errors than GenG. Furthermore, the AICs of the Tweedie and twopart Gamma models are almost identical, suggesting comparable model fit: the Tweedie AIC is 37777, whereas the twopart Gamma has a combined AIC of 37252. The twopart GenG shows superior model fit with a combined AIC of only 36209. The Poisson fits the data very badly with an AIC of 51495. The AIC of the Tobit model is significantly higher with a value of 43649. When plotting the true and estimated quantiles of the cost outcome for all distributions against each other, both twopart models exhibit better model fit for the lower quantiles, whereas the Tweedie model had slightly higher support for upper quantiles. See Fig. 2 for these QQ plots. This is probably because of the heavier tails of the Tweedie distribution. The Tobit model fits the central quantiles badly and the Poisson model is generally a bad fit.
Regarding RMSE, evaluated in a 5fold crossvalidation, twopart and Tweedie models again produce very similar results. Tweedie has the lowest RMSE with a value of 568.14, twopart Gamma has 568.60, and twopart GenG has 568.23. Tobit is slightly higher with a value of 573.26. The Poisson is slightly better than the Tobit with an RMSE of 572.33.
The estimated value for the meanvariance power parameter in the Tweedie model is p=1.719. Figure 3 shows the meanvariance plots for all 5% quantiles for the Tweedie model on the example data.
Discussion
This paper explores a single distribution GLM based on the Tweedie family of distributions for semicontinuous cost data. This model is comparable in model fit to the twopart Binomial/Gamma and Binomial/GenG model but only includes a onestage decision process, making it easier to interpret. The Tweedie model outperforms the Tobit model as the popular single distribution model for nonnegative continuous data with a support for exact zeros. The Tweedie model further outperforms the Poisson model that is often used for cost data modelling despite being a count data model. Thus, the Tweedie model provides an interesting alternative for modelling of health care utilization cost data as it has natural support for cases in which no utilization has occurred and for those it has. The Tweedie model especially shines when the correlation between users and nonusers of health care utilization is high and the proportion of these nonusers is low. On the other hand, more sophisticated models such as the twopart Binomial/GenG show superior model fit and predictive accuracy when the proportion of zeros is high and the users and nonusers suggest different characteristics. There exist situtations, especially when analysing inpatient utilization, where more that 70% zeros occurs. This is not covered in the simulation study. The simulations study only covers the RMSE metric for predictive accuracy. Other studies also measure the mean absolute error, or the mean error [6]. In the present case, this lead to almost identical results, presumably because of the ranking system. On the RAND HIE data, the Tweedie model shows slighly better predictive accuracy but worse model fit. The difference in RMSE was very small. Previous comparative studies including zero observations also found only small differences in predictive accuracy measures among different models [6]. To rule out unfortunate random splits in the crossvalidation, I performed additional analyses with different random splits. No changes in the ranking occured and only insignificant changes in the RMSE values. The theoretical justification for the Tweedie model is given as, for the discussed case where the power parameter p∈(1,2), the Tweedie model can be explained as a Poisson sum of Gamma distributions. There, the number of utilization events is expressed by a Poisson distribution and the amount of each utilization by a Gamma distribution.
There exists a variety of models that were not included in this comparison. A prominent example is the extended estimating equations (EEE) model, which starts with estimates from a Gamma GLM and then iteratively improves the link and distribution functions based on the data [20]. This semiparametric model showed good performance in many comparison studies and has the advantage of omitting the need to specify a link function in advance. However, because of its iterative procedure, the EEE needs a large number of observations (usually more than 5000) to be efficient and, in the RAND HIE case and several simulated data sets, it did not converge. Also, it does not provide a closed form likelihood, and it is therefore unclear how the AIC can be used as a comparison metric. Jones et al. [8] use the Pearson correlation test for model comparison; because of various problems accompanying hypothesis tests in general [21] and the restriction to more simple parametric models, the AIC seems preferable. While the Pearson test can only summarize the specification of the conditional mean function on the scale of interest, the AIC measures goodness of fit of the whole distribution on the scale of estimation [8]. Jones et al. also find that the commonly used loglink is often not the ideal link function [8]. The Tweedie model also supports the inverse, identity and square root link function, but I could not observe any major deviations in the results when using these instead of the log. Even more so, link functions other than the log were more unstable in the maximum likelihood estimation and led to estimation errors, but this is probably an implementation issue in the numerical maximum likelihood optimization procedure.
Some other studies have used a quasiMonte Carlo design for comparison studies [8, 22]. In this setting, several estimation data sets of various sizes are drawn with replacement from a large real data set and are then evaluated on a holdout validation set. This was inappropriate in this analysis because I was implicitly interested in different relations between users and nonusers and the varying amount of zeros. However, one disadvantage of the Monte Carlo simulation in this case is that only a predefined distribution, here the Gamma, can be used to generate data. This may bias the analysis towards models that use the Gamma, but the results show no evidence of that.
Recent research in health economic cost data modelling has mainly focused on the second continuous part of the twopart models. In this analysis, I show that the Binomial part may need more attention, as it is obvious that a logistic regression cannot adequately distinguish the nonusers and users if they share similar characteristics (i.e. they are highly correlated) and the classes are very unbalanced (i.e. the number of nonusers is low).
Although the theory of the Tweedie families has been known for more than 20 years, only recently have software packages that allow easy handling of these distributions become available [16, 23]. Further research should explore the usefulness of Tweedie distributions with p>2 as they provide similar shape to the Gamma but support heavier tails. Tweedie models in this range may be an attractive alternative for the continuous part of a twopart model or for cases without exact zeros and support a more flexible meanvariance relationship. The estimated meanvariance power parameter p=1.719 may not appropriately reflect the true relationship. The fixed p=2 in Gamma models is still too low, but values of p in the range of 2.2 to 2.3 seem to be more realistic when visually comparing the curves in Fig. 3. This also has potential for further investigation.
Swallow et al. [24] show in an ecological setting that a Bayesian hierarchical model based on the Tweedie densities provides further flexibility and removes this need to make strong assumptions about meanvariance relationships a priori. Such a hierarchical extension may also be useful to account for correlated effects by repeated measurement of individuals, for example in clinical trial settings or claims data.
Conclusion
Models based on Tweedie distributions are an interesting alternative for the analysis of semicontinuous health care cost data. They are especially useful when the correlation between users and nonusers of health care utilization is high and the proportion of these nonusers is low.
Abbreviations
 AIC:

Akaike information criterion
 EDM:

Exponential dispersion model
 GenG:

Generalized Gamma distribution
 GLM:

Generalized linear model
 RMSE:

root mean square error
References
 1
Min Y, Agresti A. Modeling nonnegative data with clumping at zero: a survey. J Iranian Stat Soc. 2002; 1(1):7–33.
 2
Duan N, Manning WG, Morris CN, Newhouse JP. A comparison of alternative models for the demand for medical care. J Bus Econ Stat. 1983; 1(2):115–26.
 3
Winkelmann R. Health care reform and the number of doctor visits–an econometric analysis. J Appl Econ. 2004; 19(4):455–72.
 4
Van Ophem H. The frequency of visiting a doctor: is the decision to go independent of the frequency?J Appl Econ. 2011; 26(5):872–9.
 5
Tobin J. Estimation of relationships for limited dependent variables. Econometrica J Econometric Soc. 1958; 26(1):24–36.
 6
Buntin MB, Zaslavsky AM. Too much ado about twopart models and transformation?: Comparing methods of modeling medicare expenditures. J Health Economics. 2004; 23(3):525–42.
 7
Manning WG, Basu A, Mullahy J. Generalized modeling approaches to risk adjustment of skewed outcomes data. J Health Economics. 2005; 24(3):465–88.
 8
Jones AM, Lomas J, Moore PT, Rice N. A quasimontecarlo comparison of parametric and semiparametric regression methods for heavytailed and nonnormal data: an application to healthcare costs. J Royal Stat Soc Ser A (Stat Soc). 2016; 179(4):951–74. doi:10.1111/rssa.12141.
 9
Basu A, Arondekar BV, Rathouz PJ. Scale of interest versus scale of estimation: comparing alternative estimators for the incremental costs of a comorbidity. Health Econ. 2006; 15(10):1091–1107. doi:10.1002/hec.1099.
 10
Hill SC, Miller GE. Health expenditure estimation and functional form: applications of the generalized gamma and extended estimating equations models. Health Econ. 2010; 19(5):608–27. doi:10.1002/hec.1498.
 11
Jones AM, Lomas J, Rice N. Applying betatype size distributions to healthcare cost regressions. J Appl Econometrics. 2014; 29(4):649–70. doi:10.1002/jae.2334.
 12
Jorgensen B. The Theory of Dispersion Models, 1 edition: Chapman and Hall/CRC; 1997.
 13
Dunn PK. Occurrence and quantity of precipitation can be modelled simultaneously. Int J Climatol. 2004; 24(10):1231–1239.
 14
Smyth GK, Jørgensen B. Fitting tweedie’s compound poisson model to insurance claims data: dispersion modelling. Astin Bulletin. 2002; 32(01):143–57.
 15
Mihaylova B, Briggs A, O’Hagan A, Thompson SG. Review of statistical methods for analysing healthcare resources and costs. Health Econ. 2011; 20(8):897–916.
 16
Zhang Y. Likelihoodbased and bayesian methods for tweedie compound poisson linear mixed models. Stat Comput. 2013; 23(6):743–57.
 17
Dunn PK, Smyth GK. Evaluation of tweedie exponential dispersion model densities by fourier inversion. Stat Comput. 2008; 18(1):73–86.
 18
McCullagh P, Nelder JA. Generalized Linear Models. Vol. 37: CRC press; 1989.
 19
Deb P, Trivedi PK. The structure of demand for health care: latent class versus twopart models. J Health Econ. 2002; 21(4):601–25.
 20
Basu A, Rathouz PJ. Estimating marginal and incremental effects on health outcomes using flexible link and variance function models. Biostatistics. 2005; 6(1):93–109.
 21
Cohen J. The earth is round (p<.05). Am Psychol. 1994; 49:997–1003.
 22
Deb P, Burgess JF. A quasiexperimental comparison of econometric models for health care expenditures. Hunter College Department of Economics Working Papers. 2003;212.
 23
Dunn PK. Tweedie: Tweedie Exponential Family Models. 2014:1–32. R package version 2.2.1.
 24
Swallow B, Buckland ST, King R, Toms MP. Bayesian hierarchical modelling of continuous nonnegative longitudinal data with a spike at zero: An application to a study of birds visiting gardens in winter. Biometrical J. 2016; 58(2):357–71.
Acknowledgements
The author thanks Rolf Holle and Irena Cenzer for helpful comments on the manuscript.
Funding
No funding was received.
Availability of data and materials
The Rand HIE data set is freely available online in the R package sampleSelection.
Consent to publish
Not applicable.
Author information
Affiliations
Contributions
CFK designed the study and wrote the manuscript.
Corresponding author
Correspondence to Christoph F. Kurz.
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Competing interests
The author declares that he has no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Health economics
 Tweedie distribution
 Health care utilization
 Cost data