Skip to main content

Zero-augmented beta-prime model for multilevel semi-continuous data: a Bayesian inference


Semi-continuous data characterized by an excessive proportion of zeros and right-skewed continuous positive values appear frequently in medical research. One example would be the pharmaceutical expenditure (PE) data for which a substantial proportion of subjects investigated may report zero. Two-part mixed-effects models have been developed to analyse clustered measures of semi-continuous data from multilevel studies. In this study, we propose a new flexible two-part mixed-effects model with skew distributions for nested semi-continuous cost data under the framework of a Bayesian approach. The proposed model specification consists of two mixed-effects models linked by the correlated random effects: Part I) a model on the occurrence of positive values using a generalized logistic mixed model; and Part II) a model on the magnitude of positive values using a linear mixed model where the model errors follow skew distributions including beta-prime (BP). The proposed method is illustrated with pharmaceutical expenditure data from a multilevel observational study and the analytic results are reported by comparing potential models under different skew distributions. Simulation studies are conducted to assess the performance of the proposed model. The DIC3, LPML, WAIC, and LOO as the Bayesian model selection criteria and measures of divergence used to compare the models.

Peer Review reports


Semi-continuous data featured with an excessive proportion of zeros and right-skewed positive values arise frequently in health economics and health services research [1]. Examples include alcohol consumption, household-level consumption of food items, medical cost, and substance abuse symptom scales. Statistical models with normality assumptions ignoring the skewness and the spike at zero are not suitable for this type of data and may lead to substantial bias and incorrect statistical inferences. Two-part (zero-augmented) models, originating in econometrics [2, 3], have been developed extensively in the last three decades to analyse this type of data and have been applied to scientific fields other than economics such as clinical research and health services. In two-part models, we view a semi-continuous variable as the result of two processes: one binomial process determining whether the positive value occurs and one continuous process determining the actual value given it is nonzero. Therefore, a two-part model consists of two components, with the first component (i.e., Part I) modelling the probability of a response being positive using the probit or logistic regression, and the second component (i.e., Part II) modelling the conditional mean of the positive values (given positive values occurred) using the continuous regression. According to the data structures, various methods have been developed for analysing cross-sectional and longitudinal semi-continuous data [3,4,5,6,7]. Olsen and Schafter [6] first extended the two-part models developed by Duan et al. [3] and Manning et al. [8] for cross-sectional data to the longitudinal setting by introducing correlated random-effects into the logit and log-normal components, respectively, and applied them to longitudinal alcohol data. In the two-part mixed-effects models, the binomial process is typically modelled with mixed-effects logistic or probit regression, and the continuous process is naturally modelled via linear mixed models (LMMs). The random-effects in the two components are generally assumed to be correlated through a multivariate normal distribution structure. Ignoring the between-component association mistakenly can yield biased estimates in the second part of the model [9]. The correlated random-effects can capture not only the between-component association but also the within-subject correlation among repeated measurements collected from the same individual and nested data. A between-component correlation means that the process giving rise to the positive values is related to the magnitude of the observed value given that a positive response occurred. For example, in a data collection of self-reported daily drinks (DDD) where zero represents no daily drinks and the continuous positive values reflect the mean of drinks per day, a positive correlation suggests that an individual with high odds of drinking tended to drink more alcohol [10].

For the positive part of a semi-continuous variable, LMMs with a normality assumption were used by Husted et al. [11] and Su et al. [9]. However, the positive part of a semi-continuous variable is often right-skewed. The logarithmic transformation was the most commonly used approach to correct the skewness [6, 7] and other monotone increasing functions such as Box-Cox transformation that would make the positive component approximately normal were also explored [12, 13]. The limitations with data transformation in Part II include reduced information, difficulty in interpreting the results and possible heteroscedasticity [13, 14]. An alternative approach is to use generalized linear mixed models (GLMMs) with distributions in the exponential family that can model skewed data, such as Log-Normal, Log-Skew-Normal [15], Gamma [13], Inverse Gamma, Inverse Gaussian [16], Beta [17], Bridge [18], Generalized Gamma family, and Weibull distributions [19]. It is noted that GLMMs often involve complicated iterative procedures in estimation which may lead to intensive computation burden and non-convergence issues. It would be most effective to use a flexible distribution to model the right skewed positive values in two-part models. Recently, studies have been presented using the beta-prime (BP) distribution to fit long-tail semi-continuous responses [10, 20, 21]. There is very limited research on the application of this skew distribution in a two-part mixed-effects model [22]. This study is an extension of Kamyari et al. (2021), where random effects are added to the linear predictor terms by using real two-level data.

Parameter estimations in two-part modelling could be computationally difficult. For two-part models with independent random-effects, maximum likelihood estimates (MLE) can be derived by fitting separate mixed-effects model to each part [1]. For the correlated two-part mixed-effects model with log-normal distribution on the positive values, Olsen and Shafter [6] and Tooze et al. [7] developed different maximum likelihood approaches. Several authors have proposed Bayesian approaches to fit the two-part models [17, 23,24,25,26,27]. For example, Cooper et al. [23] used a Bayesian approach via Markov Chain Monte Carlo (MCMC) to fit a probit-lognormal correlated two-part model on medical cost data.

As a result, In this study, we propose a two-part mixed-effects model with a logistic mixed model on the occurrence of positive values and a GLMM with BP distribution on the continuous positive values using the Bayesian approach via MCMC procedure with application to a three-level pharmaceutical expenditure data. The data used for this study was extracted from the Iranian pharmaceutical expenditure (PE-2018) survey. The survey was a cross-sectional study that had been conducted by the National Center for Health Insurance Research, Iran Health Insurance Organization. PE-2018 is a dataset of yearly pharmaceutical expenditure per person conducted in 429 cities in Iran.

The rest of the article is organized as follows. At the beginning of Methods, we describe the BP regression model. In Model specification, we present the two-part mixed effects model for responses with BP distribution. In Numerical study, we apply the proposed methodologies to real data and report the analysis results. In Simulations, simulation studies are conducted to assess the performance of the proposed models. Finally, we conclude the article with a discussion in Discussion.


Beta-prime regression model

The BP distribution [28, 29] is also known as inverted beta distribution or beta distribution of the second kind, often the model of choice for fitting semi-continuous data where the response variable is measured continuously on the positive real line (Y > 0) because of the flexibility it provides in terms of the variety of shapes it can accommodate. The probability density function (PDF) of a BP distributed random variable Y parameterized in terms of its mean μ and a precision parameter ψ is given by

$$f\left(y|\mu, \psi \right)=\frac{y^{\mu \left(\psi +1\right)-1}\ {\left(1+y\right)}^{-\left[\mu \left(\psi +1\right)+\psi +2\right]}}{B\left(\mu \left(1+\psi \right),\psi +2\right)}\kern1em ,\kern0.75em y>0$$

where B denote the beta function, μ > 0, ψ > 0, E(Y) = μ, and Var(Y) = (μ(1 + μ))/ψ.

Figure 1 displays some plots of the density function in Eq. (1) for some parameter values. It is evident that the distribution is very flexible and it can be an interesting alternative to other distributions with positive support. Figure 1 shows that for a fixed value of the mean μ, higher values of ψ lead to a reduction of Var(Y), and vice versa. If Y has PDF as in Eq. (1), we denote Y~BP(μ, ψ). Next, to connect the covariate vector Xk, k = 1, …, m to the random sample Y1, Y2, …, Ym of Y, we use a suitable link function g1 that maps the mean interval (0, +∞) onto the real line. This is given as \({g}_1\left({\mu}_k\right)={\boldsymbol{X}}_{\boldsymbol{k}}^{\prime}\boldsymbol{\beta}\), where β is the vector of regression parameters, and the first element of Xk is 1 to accommodate the intercept. The precision parameter ψk is either assumed constant [30, 31] or regressed onto the covariates [30, 32] via another link function g2, such that \({g}_2\left({\psi}_k\right)={\boldsymbol{Z}}_{\boldsymbol{k}}^{\prime}\boldsymbol{\gamma}\), where Zk is a covariate vector (not necessarily similar to Xk) and γ is the corresponding vector of regression parameters. Similar to Xk, Zk also accommodates an intercept. The link functions g1 : y > 0 and g2 : y > 0 must be strictly monotone, positive and at least twice differentiable, such that \({\mu}_k={g}_1^{-1}\left({\boldsymbol{X}}_{\boldsymbol{k}}^{\prime}\boldsymbol{\beta} \right)\) and \({\psi}_k={g}_2^{-1}\left({\boldsymbol{Z}}_{\boldsymbol{k}}^{\prime}\boldsymbol{\gamma} \right)\), with \({g}_1^{-1}(.)\) and \({g}_2^{-1}(.)\) being the inverse functions of g1(.) and g2(.), respectively. We can estimate the parameters of the BP regression model defined in Eq. (1) using the gamlss function in the R (≥ 3.3.0) language [33] with a package of the same name [34].

Fig. 1
figure 1

Plots of probability density function of the BP distribution considering the following values of μ = 0.5 (blue), μ = 1.0 (yellow), μ = 2.0 (green) and μ = 3.0 (red). Graphs plotted by using Wolfram Mathematica

Model specification

In this section, we present our model for the yearly pharmaceutical expenditure record in three levels. However, our model can be adapted easily to more complicated settings.

In the three-level pharmaceutical expenditure record data, level 3 is the province; level 2 is the city level that nested within province; and level 1 is the subject level that is nested in cities. There are two types of correlations at different levels in the pharmaceutical expenditure data. The first type exists at the province level, where cost records of the same province are correlated. For each subject, there also exists another correlation within each city. Thus, a three-level random effects two-part model is more appropriate for the analysis of the pharmaceutical expenditure data. We are interested in modelling a three-level semi-continuous pharmaceutical expenditure data, characterized by a large portion of zeros and continuous positive values. We define notations as follows. Suppose that we observe cost record yijk for the k-th subject of city j within province i, where i = 1, 2, ..., n, j = 1, 2, ..., ni, and k = 1, 2, ..., mij. The total number of cities is \(J=\sum_{i=1}^n{n}_i\), and the total number of subjects is \(N=\sum_{i=1}^n\sum_{j=1}^{n_i}{m}_{ij}\). Let ωijk = I(yijk > 0) denote the indicator of yijk being nonzero. Define by Xijk the covariate vectors for the fixed effect. Let p1i and p2i be the correlated random effects in the province level with joint density (p1ip2i) for parts I and II of our proposed model, respectively. Similarly, define c1ij and c2ij to be the correlated random effects with joint density (c1ijc2ij) in the city level. In this paper, we assume that

$$\left({p}_{1i},{p}_{2i}\right)\sim N\left(o,{\boldsymbol{\Sigma}}_{1}=\left(\begin{array}{cc}{\sigma}_{p_1}^2& {\rho}_p{\sigma}_{p_1}{\sigma}_{p_2}\\ {}{\rho}_p{\sigma}_{p_1}{\sigma}_{p_2}& {\sigma}_{p_2}^2\end{array}\right)\right)\ \textrm{and}\ \left({c}_{\text{1ij}},{c}_{\text{2ij}}\right)\sim N\left(o,{\boldsymbol{\Sigma}}_2=\left(\begin{array}{cc}{\sigma}_{c_1}^2& {\rho}_c{\sigma}_{c_1}{\sigma}_{c_2}\\ {}{\rho}_{c}{\sigma}_{c_2}{\sigma}_{c_1}& {\sigma}_{c_2}^2\end{array}\right)\right)$$

with Σ1 and Σ2 being positive definite matrices. We also assume that (p1i, p2i) and (c1ij, c2ij) are independent for all i’s and j’s. Denote by eijk the error term for the positive value of Yijk. We assume that \({e}_{ijk}\sim N\left(o,{\sigma}_e^2\right)\) is independent of random effects p1i, p2i, c1ij, and c2ij. Define τijk = P(ωijk = 1| p1i, c1ij) to be the probability of non-zero value for Yijk.

To obtain interpretable covariate effects on the marginal mean, we propose the following marginalized two-part model that parameterizes the covariate effects directly in terms of the marginal mean, μijk = E(Yijk), on the original (i.e., untransformed) data scale. The marginalized two-part model with random (cluster) effects for the zero and the continuous components, respectively, specifies the linear predictors.

Part I:

$${\displaystyle \begin{array}{c}{\omega}_{ijk}\sim Bernoulli\left({\tau}_{ijk}\right)\\ {} logit\left({\tau}_{ijk}\right)= logit\left({\Pr}\left({Y}_{ijk}\ne 0|{p}_{1i},{c}_{1 ij}\right)\right)={\boldsymbol{X}}_{1 ij k}^{\prime}\times \boldsymbol{\alpha} +{p}_{1i}+{c}_{1 ij}\\ {}=\left({\alpha}_{0}+{p}_{1i}+{c}_{1 ij}\right)+\sum_{\gamma =1}^r{\alpha}_{\gamma }{x}_{\gamma_i}\end{array}}$$

Part II:

$${\displaystyle \begin{array}{c}\left({Y}_{ijk}|{Y}_{ijk}>0\right)\sim Beta\ Prime\left({\upmu}_{\textrm{ijk}},\psi \right)\\ {}{\upmu}_{\textrm{ijk}}=E\left({Y}_{ijk}>0|{p}_{2i},{c}_{2 ij}\right)={\exp}\left({\boldsymbol{X}}_{2 ij k}^{\prime}\times \boldsymbol{\beta} +{p}_{2i}+{c}_{2 ij}+{e}_{ijk}\right)\\ {}={\exp}\left(\left({\beta}_0+{p}_{2i}+{c}_{2 ij}\right)+\sum_{\theta =1}^q{\beta}_{\theta }{x}_{\theta_i}+{e}_{ijk}\right)\end{array}}$$

where X1N × (r + 1) and X2N × (q + 1) have full rank r and q for the zero and the continuous components, respectively; α(r + 1) × 1 and β(q + 1) × 1 are the corresponding vectors of regression coefficients. As seen in relations (3) and (4), the mixing probability and mean of the component of the continuous parts are linked to the independent variables through logit and logarithmic link functions. The vectors p1 = (a11, a12, …, a1m) and p2 = (a21, a22, …, a2m) denote random effects of the third level in the components of logistic and continuous, respectively, whereas \({\boldsymbol{c}}_1={\left({b}_{111},\dots, {b}_{11{n}_1},\dots, {b}_{1m1},\dots, {b}_{1m{n}_m}\right)}^{\prime }\) and \({\boldsymbol{c}}_2={\left({b}_{211},\dots, {b}_{21{n}_1},\dots, {b}_{2m1},\dots, {b}_{2m{n}_m}\right)}^{\prime }\) are the random effects of the second level. For simplicity of interpretation and mathematical calculations, the random effects (p1, p2) and (c1, c2) are assumed to be independent and normally distributed with mean zero and variances \({\sigma}_{p_1}^2,{\sigma}_{p_2}^2,{\sigma}_{c_1}^2\) and \({\sigma}_{c_2}^2\), respectively [35, 36]. The error terms \({e}_{ijk}\sim N\left(0,{\sigma}_e^2\right)\) are also assumed to be normal distribution and independent of the random effects at both levels 2 and 3.

Again and according to the data structure in this study (three-level data), \({\boldsymbol{X}}_{1 ijk}^{\prime }\) is the vector of covariates for the k-th measurement at the j-th city (level-2) at the i-th province (level-3) for the binary part and \({\boldsymbol{X}}_{2 ijk}^{\prime }\) is the vector of covariates for the k-th measurement at the j-th city (level-2) at the i-th province (level-3) for the continuous part. The two parts might have common covariates or completely different ones. α is the vector of model coefficients corresponding to the binary part and β is the vector of coefficients corresponding to the continuous part conditional on the values being non-zero. The model can be easily extended to include higher-level random effects.

The conditional PDF for yijk is expressed as:

$${\displaystyle \begin{array}{c}\textrm{f}\left({\textrm{y}}_{\textrm{ijk}}|{\textrm{p}}_{\textrm{1i}},{\textrm{p}}_{\textrm{2i}},{\textrm{c}}_{\textrm{1ij}},{\textrm{c}}_{\textrm{2i j}}\right)={\left[1-{\uptau}_{\textrm{ijk}}\right]}^{1-{\upomega}_{\textrm{ijk}}}\times {\left[{\uptau}_{\textrm{ijk}}\times \textrm{BP}\left({\textrm{y}}_{\textrm{ijk}}|{\textrm{p}}_{\textrm{2i}},{\textrm{c}}_{\textrm{2ij}}\right)\right]}^{\upomega_{\textrm{ijk}}}\\ {}=\left\{{\left[1-{\uptau}_{\textrm{ijk}}\right]}^{1-{\upomega}_{\textrm{ijk}}}{\uptau_{\textrm{ijk}}}^{\upomega_{\textrm{ijk}}}\right\}\times {\left[\textrm{BP}\left({\textrm{y}}_{\textrm{ijk}}|{\textrm{p}}_{\textrm{2i}},{\textrm{c}}_{\textrm{2ij}}\right)\right]}^{\upomega_{\textrm{ijk}}}\end{array}}$$

Generally, the estimation of parameters α, β, ψ, Σ1 and Σ2 is based on the likelihood function of data given as:

$${L}_i=\int \prod\nolimits_{j=1}^{n_i}\left[\int \exp \left({l}_{ij}^1\right)\exp \left({l}_{ij}^2\right)\phi \left({c}_{1 ij},{c}_{2 ij}\right)d{c}_{1 ij}\ d{c}_{2 ij}\kern0.5em \right]\phi \left({p}_{1i},{p}_{2i}\right)\ d{p}_{1i}\ d{p}_{2i}$$

where the log-likelihood for the binary part is

$${\displaystyle \begin{array}{c}{l}_{ij}^1={\sum}_{k=1}^{m_{ij}}\left[{\omega}_{ij k}\log \left({\tau}_{ij k}\right)+\left(1-{\omega}_{ij k}\right)\log \left(1-{\tau}_{ij k}\right)\right]\\ {}=\sum_{k=1}^{m_{ij}}\left[{\omega}_{ij k}\ \textrm{logit}\left({\tau}_{ij k}\right)+\log \left(1-{\tau}_{ij k}\right)\right]\end{array}}$$

and the log-likelihood for the continuous part is

$$\textrm{With}\ {e}_{ijk}={\boldsymbol{Y}}_{ijk}-{\boldsymbol{X}}_{2 ij k}^{\prime}\boldsymbol{\beta} -{\boldsymbol{p}}_{2i}-{\boldsymbol{c}}_{2 ij}$$

In this likelihood function (Eq. 5), ϕ(p1i, p2i) and ϕ(c1ij, c2ij) represents the bivariate normal distribution for the random effects with mean vector of zeros and variance–covariance matrix Σ1 and Σ2 for zero and non-zero part respectively. As can be seen from Eq. (5), the likelihood function involves the integral with respect to the multivariate normal PDF. Parameter estimation in the proposed models can be computationally difficult as the likelihood function depends on analytically intractable integrals of a non-linear function with respect to the multivariate normal distribution of random-effects.

Bayesian inferential framework

The parameters in part I and II were individually estimated within a Bayesian inferential framework with MCMC sampling of the posteriors.

Let Θ = (α, β, ψ, Σ1, Σ2) be the collection of unknown population parameters in models (2), (3) and (4). To complete the Bayesian formulation, we specify mutually independent prior distributions for all the unknown parameters as follows:

$${\displaystyle \begin{array}{c}\upalpha \sim {\textrm{N}}_{\textrm{r}}\left({\upalpha}_0,{\Lambda}_1\right),\upbeta \sim {\textrm{N}}_{\textrm{q}}\left({\upbeta}_0,{\Lambda}_2\right)\\ {}\uppsi \sim \textrm{IG}\left(\upalpha, \upbeta \right),{\Sigma}_1^{-1}\sim \textrm{IW}\left({\Omega}_1,{\upnu}_1\right),{\Sigma}_2^{-1}\sim \textrm{IW}\left({\Omega}_2,{\upnu}_2\right)\end{array}}$$

where by considering the information available from literature [1, 16, 37] and range of the parameters Normal (N), Inverse Gamma (IG), and Inverse Wishart (IW) distributions are chosen to simplify computations.

Let the observed data \(\mathfrak{D}=\left\{\left({\omega}_{ij},{y}_{ij},{x}_{1 ij},{x}_{2 ij}\right);i=1,\dots, n;j=1,\dots, {n}_i\right\}\), f(.) be a density function, f(.| .) be a conditional density function and h(.) be a prior density function. We assume that the parameters in Θ are independent of each other; that is:

$$h\left(\Theta \right)=h\left(\upalpha \right)h\left(\upbeta \right)h\left(\psi \right)h\left({\Sigma}_1\right)h\left({\Sigma}_2\right).$$

After specifying the models for the observed data and prior distributions of the unknown model parameters, we can draw samples for the parameters based on their posterior distributions under the Bayesian framework. Therefore, the joint posterior density of Θ, conditional on \(\mathfrak{D}\), can be determined by

$$f\left(\Theta |\mathfrak{D}\right)\propto \left\{\prod\nolimits_{i=1}^n\int \prod\nolimits_{j=1}^{n_i}\left[\int \exp \left({l}_{ij}^1\right)\exp \left({l}_{ij}^2\right)\phi \left({c}_{1 ij},{c}_{2 ij}\right)d{c}_{1 ij}\ d{c}_{2 ij}\kern0.5em \right]\phi \left({p}_{1i},{p}_{2i}\right)\ d{p}_{1i}\ d{p}_{2i}\right\}h\left(\Theta \right)$$

The integral in (9) has a high dimension and does not have a closed solution. Analytic approximations to the integrals may not be accurate enough. So, the direct calculation of the posterior distribution of Θ based on the observed data \(\mathfrak{D}\) is prohibitive [1]. As an alternative, posterior computation of Θ can proceed using a MCMC procedure via Gibbs sampling or Metropolis-Hastings (M-H) algorithm. While the Gibbs sampler relies on conditional distributions [23, 38,39,40] the Metropolis-Hastings sampler uses a full joint density distribution to generate a candidate draws [38, 41]. Certainly, there is a large body of work on other computational approaches to sampling (slice sampling, adaptive rejection sampling, Hamiltonian Monte Carlo, etc.); covering such methods is beyond the scope of this study. In an initial review of the software, we concluded that it would be faster to use the OpenBUGS software. Moreover, due to the high volume of data (calculations) and the time limitation, we could not check the performance of other software. However, OpenBUGS was chosen because of its generality and simplicity. The associated OpenBUGS code is available in Additional file 1: Appendix A.1.

Model complexity and fit

There are a variety of methods to select the model that best fits the data. However, in this research article, we focus on the log pseudo marginal likelihood (LPML) and a modified observed deviance information criterion, denoted here by DIC3. In addition, we use of two emerging model selection methods, namely leave-one-out cross-validation (LOO-CV) and widely available information criterion (WAIC), due to their fully Bayesian nature.

The Bayesian Deviance Information Criterion (DIC3) [42] is used to compare the models fitted. It is defined by

$${\textrm{DIC}}_3=\overline{D}\left(\theta \right)+{p}_D$$

where \(\overline{D}\left(\theta \right)=-2E\left\{\log \left[p\left(y|\uptheta \right)\right]|y\right\}\) is the posterior mean deviance taken as Bayesian measure of fit, \(p\left(y|\uptheta \right)=\prod_{i=1}^np\left({y}_i|\uptheta \right)\), E{log[p(y| θ)]| y} is the posterior expectation of log[p(y| θ)] and pD is the effective number of parameters representing model complexity. The DIC3 is a natural generalization of the Akaike Information Criterion (AIC) [43] and interpreted as a Bayesian measure of fit penalized for increased model complexity. The DIC3 was developed to solve the problem of determining the ‘effective’ number of parameters (pD) in complex non-nested hierarchical models and its computation has been coded into the latest version of WinBUGS (1.4). As for the usual DIC, minimum DIC3 estimates the model that will make the best short-term predictions [42]. Note that the DIC3 is only comparable across models with exactly the same observed data.

The LPML [44] is another measure for comparing models that derived from the conditional predictive ordinate (CPO) statistics and is one of the most widely used model selection criteria available in WinBUGS. It is derived from the posterior predictive distribution. For our proposed models, a closed form of the CPOi is not available. However, a Monte Calro estimate of CPOi can be obtained by using a single MCMC sample from the posterior distribution of θ, through a harmonic-mean approximation proposed by [45], as \(\hat{CPO_i}={\left\{\frac{1}{K}\sum_{i=1}^Kg{\left({y}_i|{\uptheta}^{(i)}\right)}^{-1}\right\}}^{-1}\), where θ(1), …, θ(K) is a post burn-in sample of size K from the posterior distribution from θ, and g is the marginal distribution of Y (integrated over the random effects). A summary statistic of the CPOi is the LPML, defined by \(LPML=\sum_{i=1}^n\log \left(\hat{CPO_i}\right)\). Larger values of LPML indicate better fit.

The Watanabe-Akaike (or widely applicable) information criterion (WAIC) [46, 47] is closely related to the more widely known DIC measure, which is based on a deviance. The WAIC is a more fully Bayesian approach for estimating out-of-sample expectation. In general, the WAIC is defined as:

$$WAIC=2{p}_{WAIC}-2 LPPD$$

The deviance term in DIC is \({\log}\left(p\left(y|\overset{\sim }{\theta}\right)\right)\) where \(\overset{\sim }{\theta }\) is a point estimate of θ. For WAIC, this term is replaced by the log pointwise predictive density (LPPD), defined as:

$$LPPD=\sum\nolimits_{i=1}^n\log \int p\left({y}_i|\theta \right){p}_{post}\left(\theta \right) d\theta \approx \sum\nolimits_{i=1}^n\log \frac{1}{M}\sum\nolimits_{m=1}^Mp\left({y}_i|{\theta}^{(m)}\right).$$

Just like DIC, there are variants of WAIC which depend on how pWAIC is defined. Gelman, Hwang, and Vehtari also propose \({p}_{WAIC2}=\sum_{i=1}^n{Var}_{post}\left[\log p\left({y}_i|\theta \right)\right]\) as a penalty term, where pWAIC2 is “the variance of individual terms in the log predictive density summed over the n data points” [48]. Although DIC is a commonly used measure to compare Bayesian models, WAIC has several advantages over DIC, including that it closely approximates Bayesian cross-validation, it uses the entire posterior distribution and it is invariant to parameterisation [49].

Exact cross-validation requires re-fitting the model with different training sets. Approximate leave-one-out cross-validation (LOO-CV) can be computed easily using importance sampling [50]. The Bayesian LOO estimate of out-of-sample predictive fit is

$$LOO=-2{LPPD}_{LOO}=-2\sum_{i=1}^n\log \int p\left({y}_i|\theta \right){p}_{post}\left(\theta |{y}_{-i}\right) d\theta$$

where ppost(θ| yi) is the posterior distribution based on the data without the i-th data point. Unlike LPPD that uses data point i for both the computation of posterior distribution and the prediction, here LPPDLOO only uses it for prediction, and hence there is no need for a penalty term to correct the potential bias introduced by using data twice [51].

The question regarding the real data is whether the data had better support a true model. To that end, we fit each model using OpenBUGS and compute DIC and LPML for each. We also export the joint posterior distributions from OpenBUGS into R and compute WAIC and LOO with “loo” package [52].

Numerical study

Specific models and implementation

In this section, we apply the zero-augmented gamma with random effects and zero-augmented beta-prime with random effects to analyze the multilevel pharmaceutical expenditure dataset previously described, where response (yijk) is the total pharmaceutical expenditure ($USD) for all drugs prescribed during a 1 year period related to the subject k (k = 1, …, 29,354) that nested within city j (j = 1, …, 429) that are nested within province i (i = 1, …, 31). From now on, the zero-augmented gamma regression model and zero-augmented beta-prime regression model with multilevel random effects, will be called ZAG-RE model and ZABP-RE model, respectively.

Figure 2(a-d), shows the quintiles of adjusted pharmaceutical expenditure and counts of drugs by provinces and cities in Iran in 2018. Variation in pharmaceutical expenditure and counts of drugs among clusters (provinces and cities) is well shown. The PE-2018 dataset contains 16.1% of observations with no cost for drugs during the 2018 year. In addition, there is accentuated asymmetry in the empirical distribution of the positive responses, which is confirmed by the sample skewness and sample quartiles (Table 1). These results proposed a skew distribution as a candidate for fitting the pharmaceutical expenditure.

Fig. 2
figure 2

Quintiles of adjusted pharmaceutical expenditure (PE-2018) and counts of drugs by provinces (right) and cities (left) in Iran in 2018. A, B Variation in adjusted pharmaceutical expenditure. C, D Variation in adjusted counts of drugs

Table 1 Percentage of zeros and descriptive statistics of positive expenditure by sex in PE-2018 dataset

Here we model τijk and μijk as follows:

$$logit\left({\tau}_{ijk}\right)={\boldsymbol{X}}_{1 ij k}^{\prime}\times \boldsymbol{\alpha} +{p}_{1i}+{c}_{1 ij}\ \textrm{and}\ \log \left({\mu}_{ijk}\right)={\boldsymbol{X}}_{2 ij k}^{\prime}\times \boldsymbol{\beta} +{p}_{2i}+{c}_{2 ij}$$

where \({\boldsymbol{X}}_{1 ijk}^{\prime }\) and \({\boldsymbol{X}}_{2 ijk}^{\prime }\) are the matrices of population effects related to subject k, containing an intercept and the following six covariates: Totalk (total inpatients expenditure ($) per year), ICPDk (insurance coverage for prescription drugs per year), NDPPk (number of drugs per prescription), NOPk (number of prescriptions), Agek (in year), and Sexk (1 = male, 0 = female); α and β are coefficient vectors for the mean and zero-part regression, respectively; p1i and p2i be the correlated random effects in the province level with joint density (p1ip2i) for parts I and II of our model, respectively. Similarly, define c1ij and c2ij to be the correlated random effects with joint density (c1ijc2ij) in the city level.

In the absence of historical data/experiment, our prior choices follow the specifications described in Model specification. Thus, we consider the following independent (weak) priors for the MCMC sampling:

$${a}_r\sim N\left(0,{10}^4\right),r=1,2,3,$$
$${\beta}_q\sim N\left(0,{10}^4\right),q=1,\dots, 7,$$
$${\boldsymbol{\Sigma}}_1^{-1}\sim IW\left(0.01{I}_2,2\right),{\boldsymbol{\Sigma}}_2^{-1}\sim IW\left(0.01{I}_2,2\right)\ \textrm{and} \ \textrm{finally}$$
$$\psi \sim IG\left(0.01,0.01\right).$$

We generate two parallel independent MCMC runs of size 200,000 – each of them with widely dispersed initial values – and discard the first 100,000 iterations (burn-in samples) for later computing of posterior estimates. We consider a lag of size 100 to eliminate potential problems due to autocorrelation and monitor the convergence of the MCMC chains using trace plots and the R statistic [53], which indicates convergence of about 1. To improve convergence, we divide the response (Yijk) by 100.

Estimation and model comparison

We consider significant those effects whose 95% equal-tail credible intervals (CI) do not include zero (Table 2 and Fig. 3). Except NDPP and sex in continuous part of ZAG-RE, 95% equal-tail CIs show that, all other variables were significant in two parts of models (Fig. 3). Therefore, the final ZAG-RE and ZABP-RE models have, respectively, the following systematic setting:

$$\begin{array}{l}{logit}\left(\tau_{ijk}\right)=\alpha_{0}+\alpha_{1}\times{NDPP}_{k}+\alpha_{2}\times{Age}_{k}+\alpha_{3}\times{Sex}_{k}+p_{1i}+c_{1ij}\;\text{and}\\ \log\left(\mu_{ijk}\right)=\beta_{0}+\beta_{1}\times{Total}_{k}+\beta_{2}\times{ICPD}_{k}+\beta_{3}\times{NDPP}_{k}+\beta_{4}\times{NOP}_{k}+\beta_{5}\times{Age}_{k}+\beta_{6}\times{Sex}_{k}+p_{2i}+c_{2ij}.\end{array}$$
Table 2 Bayesian selection criteria and posterior estimates of the ZAG-RE and ZABP-RE models fitted to pharmaceutical expenditure (PE-2018) data
Fig. 3
figure 3

PE-2018 data. Posterior medians and 95% equal-tail credible intervals for parameters associated with fixed effects on the zero-part (Part I) and on the continuous-part (Part II) for the zero-augmented gamma regression and zero-augmented beta-prime regression models

The posterior estimates of parameters of model (Eq. 10) shown in Table 2, are quite close in both ZAG-RE and ZABP-RE models while estimates of variance components differ between them. However, 95% equal-tail CI for \({\sigma}_{p_1{p}_2}\) and \({\sigma}_{c_1{c}_2}\) includes zero in both models, indicating no correlation between variances in level 2 and level 3. Posterior standard deviations of variance components are a bit larger under the ZAG-RE model. Also, it is important note that the meaning of parameter ψ differs between the ZAG-RE and ZABP-RE models. In ZAG-RE model, ψ represents the dispersion parameter, while in ZABP-RE model, it represents the invariance of Yijk, conditioned on the random effects.

We use DIC3, LPML, WAIC, and LOO as the Bayesian model selection criteria and measures of divergence discussed previously to compare the ZAG-RE and ZABP-RE to fit the PE-2018. Except in computational time, The ZABP-RE model performs better according to all other criteria, because it has the smaller DIC3, WAIC, and LOO-CV and greater LPML (Table 2). Based on those results, we select the ZABP-RE as our best model.

We also conducted a sensitivity analysis on the prior assumptions for the dispersion parameter (ψ) and the fixed effects precision parameter. In particular, we allowed that the dispersion ψ Gamma (k, k) with k {0.001, 0.1} and the normal precision on the fixed effects to be 0.1, 0.25 and 0.001. We checked the sensitivity in the posterior estimates of β by changing one parameter at a time and refitting both models. Although slight changes were observed in parameter estimates and model comparison values, the results appeared to be robust and did not change our conclusions regarding the best model, inference, and sign of the fixed-effects.

From these findings, we further report the results in detail only for the best ZABP-RE model in the following Section.

Results for the ZABP-RE model

We use the ZABP-RE model to interpret parameters effects on the mean of positive expenditure (μijk) and the probability of non-cost (τijk) by considering individual effects as zero. To measure effects directly on μijk and τijk, we take the anti-logarithm of log(μijk) and logit(τijk) in Eq. (10), obtaining

$${\displaystyle \begin{array}{c}{\mu}_{ijk}=\exp \left({\beta}_0+{\beta}_1{Total}_k+{\beta}_2{ICPD}_k+{\beta}_3{NDPP}_k+{\beta}_4{NOP}_k+{\beta}_5{Age}_k+{\beta}_6{Sex}_k\right)\\ {}\textrm{and}\\ {}{\tau}_{ijk}=\frac{\exp \left({\alpha}_0+{\alpha}_1\times {NDPP}_k+{\alpha}_2\times {Age}_k+{\alpha}_3\times {Sex}_k\ \right)}{1+\exp \left({\alpha}_0+{\alpha}_1\times {NDPP}_k+{\alpha}_2\times {Age}_k+{\alpha}_3\times {Sex}_k\right)}\end{array}}$$

We use the posterior means in Table 2 as estimates of the parameters. From Eq. (11), parameter βi represents the rate of change in the logarithm of the mean of the positive expenditure as each of total, ICPD, NDPP, NOP, and age increases one unit. Therefore, increasing NOP variable of subject k in the original scale by one, the log (μijk) increases by 0.22, where exp(0.22) = 1.25 is the increasing value of response variable in its original scale. Parameters α0, α1, α2 and α3 contribute to the calculation of τijk in Eq. (11). Here, α0 represents the effect of being a female respondent with age and NDPP set to their respective mean. Setting NDPP and age variables to zero – which implies they are set to their respective mean in the original scale – the probability of no consumption is 1 − τijk = 1 −  exp (1.645)/(1 + exp(1.645) ) = 0.16 if subject k is female and 1 − τijk = 1 −  exp (1.645 − 0.183)/(1 + exp(1.645 − 0.183) ) = 0.19 if subject k is male. Overall, females tend to declare a larger expenditure since the estimate of α3 is negative. Parameters α1 (0.117) and α2 (−0.010) represent the effect of NDPP and age variables in logit (τijk). In particular, as NDPP variable increase by one unit, with every additional pharmaceutical item, the odds of having a positive expenditure increase by 12.41%. In addition, with each year growing in age, the Odds of having a positive cost decreases by 1%.

In order to evaluate the predictive performance of our best model, we generate 2000 replicates of 𝐘, say, Y = (Y(1), …, Y(2000) )T. The ijk-th element of the l-th replicate \({Y}_{ijk}^{(l)}\) is generated through the ZABP-RE(\({\mu}_{ijk}^{(l)},{\phi}^{(l)},{\tau}_{ijk}^{(l)}\)) model, where \({\mu}_{ij k}^{(l)}={{\log}}^{-1}\left(\sum_{\theta =1}^q{\boldsymbol{\beta}}_{\theta}^{(l)}{x}_{\theta_i}+{p}_i^{(l)}+{c}_{ij}^{(l)}\right)\) and \({\tau}_{ij k}^{(l)}={logit}^{-1}\left(\sum_{\gamma =1}^r{\boldsymbol{\alpha}}_{\gamma}^{(l)}{x}_{\gamma_i}+{p}_i^{(l)}+{c}_{ij}^{(l)}\right)\). The values of \({\boldsymbol{\alpha}}^{(l)}=\left({\alpha}_0^{(l)},\dots, {\alpha}_3^{(l)}\right)\), \({\boldsymbol{\beta}}^{(l)}=\left({\beta}_0^{(l)},\dots, {\beta}_6^{(l)}\right)\) and ϕ(l) are post burn-in samples of size 2000 from the posterior distribution of all parameters. Figure 4 (above panel) presents the histogram of PE-2018 placed with the plot of the ZABP-RE and ZAG-RE predictive posterior density. In this figure, it is also quite clear that with a slightly more computational time, the ZABP-RE model provides an adequate fit to the PE-2018 data.

Fig. 4
figure 4

PE-2018 data. (Above panel) predictive density histogram from pharmaceutical expenditure placed with posterior predictive densities generated using ZABP-RE regression models. (Below panel) adequacy of log link function linear predictor: conditional predictive posterior nonzero mean represented by black boxplots and nonzero observed values by gray boxplot

Finally, to evaluate the adequacy of the log-link function used to model the conditional nonzero mean μ, we follow the suggestion given in [54] as depicted in Fig. 4 (below panel). We divide the values of the linear predictor μijk into 10 intervals, with each interval containing a similar number of observations. Then, for each group, we build a boxplot of the posterior predictive mean (black boxplot) and a boxplot of the nonzero observed values (gray boxplot). In Fig. 4, we observe no evidence of link misspecification for the nonzero mean μijk, because the shapes of the fitted and observed trends are similar.


In this section, we propose a simulation study to illustrate the performance of the proposed method. Our goals for the simulation study were: 1) to investigate the behaviour of Bayesian estimates based on the empirical mean squared error (MSE), relative bias and percentage of times that the 95% credible intervals (CI) contains the true parameter value and 2) to investigate if the DIC3 and LPML Bayesian criteria properly select the best model. We conduct the simulation study considering 100 datasets generated from ZAG-RE and ZABP-RE models, considering different sample sizes n, say n = 50, 100, 150 and 200. For each dataset of size n, we model the location parameter μijk through log(μijk) = β0 + β1xijk + pi + cij, with \({p}_i\sim N\left(0,{\sigma}_p^2\right)\) and \({c}_{ij}\sim N\left(0,{\sigma}_c^2\right)\), i = 1, …, n, j = 1, …, ni, k = 1, …, mij. To keep the simulation simple and fast, τ is considered constant across observations. We generate independent explanatory variables Xijk from a Bernoulli distribution with a parameter equal to 0.8 and set β0 = 2, β1 = 1.5, τ = 0.2, \({\sigma}_p^2=1.8\), \({\sigma}_c^2=2.3\), ψ = 1.0 for the ZAG-RE model and β0 = 2, β1 = 1.5, τ = 0.2, \({\sigma}_p^2=1.8\), \({\sigma}_c^2=2.3\), ψ = 0.1 for the ZABP-RE model. We consider the following independent non-informative priors βk~N(0,100), ψ~Gamma(0.01, 0.01), \({\sigma}_p^2\sim IGamma\left(0.01,0.01\right)\), \({\sigma}_c^2\sim IGamma\left(0.01,0.01\right)\) and τ~U(0, 1).

For each dataset of size n, we calculate Bayesian estimates with 500 points from the posterior distribution. These points are based on two parallel independent MCMC runs of size 100,000 each, discarding the first 50,000 points to eliminate the effect of the initial values. To avoid correlation among observations, we consider a thinning of size 100, obtaining 500 points from the posterior distribution.

To study the frequentist properties of Bayesian estimates, we calculate the relative bias, the MSE and the 95% coverage probability (CP). Let 𝜽 = { α, β, ψ, 𝜎2} be the true vector of parameters and 𝜃s an element of 𝜽. Let \({\hat{\theta}}_s\) be the posterior mean of 500 points from the posterior distribution of 𝜃s based on dataset i of size n, i = 1, …, 100, n = 50, 100, 150, 200. The relative bias, the MSE and the 95% CP for \({\hat{\theta}}_s\) are defined as follows:


where I is the indicator function such that θs lies in the interval \(\left[{\hat{\theta}}_{is, LCL},{\hat{\theta}}_{is, UCL}\right]\), with \({\hat{\theta}}_{is, LCL}\) and \({\hat{\theta}}_{is, UCL}\) as the estimated lower and upper 95% CIs, respectively. Fig. 5A and B present a visual comparison of the parameters β0 and β1 under ZAG-RE and ZABP-RE generated data for varying sample sizes, where the dotted and black lines represent the ZAG-RE and the ZABP-RE fitted model, respectively.

Fig. 5
figure 5

Simulation study. Relative bias, mean squared error, and coverage probability for β0 and β1 using the zero-augmented gamma () regression and zero-augmented beta-prim (−) regression models

As expected, these figures reveal that if we use a true (ZAG-RE and ZABP-RE) model to fit zero augmented-skew data, relative bias and MSE, for parameters β0 and β1, tend to decrease as sample size increases indicating that the Bayesian estimates possess good consistency properties. In addition, both CP of β0 and β1 tend to be around 95% as the sample size increases when the true model is considered. However, when we do not fit the data by their respective true model (model misspecification), the relative bias and the MSE tend to be smaller in ZABP-RE model. Moreover, as expected in both cases we can see that the performance of the CP gets worse when a misspecified model is considered. For the sake of completeness, the MSE, relative bias and CP for all the parameters (β0, β1, τ, σ2) are presented in Table B1 (Additional file 1: Appendix B). It can be seen from this table that the Bayesian estimates of the mixture proportion τ are highly robust to model misspecification, and this behaviour is independent of the sample size. The dispersion parameters (σ2 and ψ) in both models are not comparable, because they are in different scale. In all results, relative bias of \({\sigma}_c^2\) are greater than \({\sigma}_p^2\) and it shows that the dispersion of responses is more at the level-2 than the level-3. In the comparison of the two models, the differences in MSE values of both \({\sigma}_p^2\) and \({\sigma}_c^2\) are absolutely more in misspecified model. The results of coverage probability of sigmas show that the coverage level in both models are acceptable, however, in true models, the values of the CP are slightly larger than the misspecified model.

Summary of model performance in simulations are presented in Table 3. Table 3 presents the averages of the Bayesian model comparison criteria. We calculate the LPML, DIC3, WAIC, LOO, and convergence rate using 100 samples of size n = 100 each. All criteria favoured the true (simulated) model.

Table 3 Summary of model performance in the simulation study


In this article, we proposed a Bayesian mixture model with random effects for modelling semi-continuous data augmented by zeros. We suggest the Gamma and BP distributions in continuous part of the models. A simulation study and real data analysis are conducted to compare the ZAG-RE and ZABP-RE on the multi-level semi-continuous data and results demonstrated that the ZABP-RE performs better on the zero-augmented multilevel semi-continuous data.

Our flexible class contains the zero-augmented versions of the two parametric exponential family of distributions, such as Gamma, beta-prime, inverse Gaussian, Weibull, log-normal, and Tweedie. Our model is able to simultaneously accommodate zeros and positive outcomes, right-skewness, within subject correlation because of nested measurements and between-subject heterogeneity. One of the differentials of this study was the inclusion of random effects in the analysis of factors related to semi-continuous data using the beta-prime distribution that were not considered in before studies and statistical packages [10, 20,21,22], and this is our major contribution.

One of the advantages of the Bayesian approach compared to the classical approach is the estimates in the part I. Where, the maximum likelihood estimator of a probability of non-zero value, when zero response is observed, does not perform well on the boundary of the parameter space [37]. For a simple BP model, the Maximum Likelihood estimation is available using GAMLSS. However, the MLE results in our data did not reach convergence for some parameters by adding random effects. In this research, using Bayesian statistics with Gibbs and Metropolis-Hasting sampling, this problem is avoided. In the future, it would be interesting to continue the study of various different MCMC methods and hopefully apply them to health cost data.

Simulation studies reveal good consistency properties of the Bayesian estimates as well as high performance of the model selection techniques to pick the appropriately fitted model. We also apply our model to a dataset from yearly pharmaceutical expenditure data conducted in 429 cities in Iran (PE-2018) to illustrate how the procedures can be used to evaluate model assumptions and obtain unbiased parameter estimates. Although our modelling is primarily motivated from the PE-2018, it can be easily applied to other datasets and distributions, because the models considered in this article have been fitted using standard available software packages, like R and OpenBUGS (code available in Additional file 1: Appendix A). This makes our approach easily accessible to practitioners of many fields of research.

Although the zero-augmented positive model considered here has shown great flexibility to deal with zero-augmented clustered data, its robustness can be seriously affected by the presence of heavy tails in the random effects, obscuring important features among individual variation. Liu et al. [13] and Bandyopadhyay et al. [55] proposed a remedy to accommodate skewness in the random effect simultaneously, using skew-normal/independent distributions. We suppose that our method can be used under the zero-augmented positive model and should yield satisfactory results at the expense of additional complexity in implementation. Another useful extension of the proposed model involves the possibility of heteroscedasticity of ψ by allowing the dependence of g(ψ) on covariates, with g(.) being an appropriate link function, as proposed in [56]. An in-depth investigation of such extensions is beyond the scope of the present research article, but it is an interesting topic for further research.

Availability of data and materials

The data that support the findings of this study are available from the corresponding author reasonable request.



Pharmaceutical expenditure




Linear mixed models


Daily drinks data


Generalized linear mixed models


Markov Chain Monte Carlo




Log pseudo marginal likelihood


Deviance Information Criterion


Conditional predictive ordinate


Watanabe-akaike information criterion


Leave-one-out cross-validation


Convergence rate


Zero-augmented gamma regression model with random effects


Zero-augmented beta-prime regression model with random effects


National Center for Health Insurance Research


  1. Xing D, Huang Y, Chen H, Zhu Y, Dagne GA, Baldwin J. Bayesian inference for two-part mixed-effects model using skew distributions, with application to longitudinal semicontinuous alcohol data. Stat Methods Med Res. 2017;26(4):1838–53.

    Article  PubMed  Google Scholar 

  2. Cragg JG. Some statistical models for limited dependent variables with application to the demand for durable goods. Econom J Econom Soc. 1971;39(5):829–44.

    Google Scholar 

  3. Duan N, Manning WG, Morris CN, Newhouse JP. A comparison of alternative models for the demand for medical care. J Bus Econ Stat. 1983;1(2):115–26.

    Google Scholar 

  4. Hall DB, Zhang Z. Marginal models for zero inflated clustered data. Stat Model. 2004;4(3):161–80.

    Article  Google Scholar 

  5. Moulton LH, Curriero FC, Barroso PF. Mixture models for quantitative HIV RNA data. Stat Methods Med Res. 2002;11(4):317–25.

    Article  PubMed  Google Scholar 

  6. Olsen MK, Schafer JL. A two-part random-effects model for semicontinuous longitudinal data. J Am Stat Assoc. 2001;96(454):730–45.

    Article  Google Scholar 

  7. Tooze JA, Grunwald GK, Jones RH. Analysis of repeated measures data with clumping at zero. Stat Methods Med Res. 2002;11(4):341–55.

    Article  PubMed  Google Scholar 

  8. Manning WG, Morris CN, Newhouse JP, Orr LL, Duan N, Keeler EB, et al. A two-part model of the demand for medical care: preliminary results from the health insurance study. Heal Econ Heal Econ. 1981;137:103–23.

    Google Scholar 

  9. Su L, Tom BDM, Farewell VT. Bias in 2-part mixed models for longitudinal semicontinuous data. Biostatistics. 2009;10(2):374–89.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Santos-Neto M, Ribeiro-Bezerra T, Bourguignon M, de Castro M. Package “BPmodel” Title Beta-Prime Regression Model. R package version 1.1.2; 2021.

    Google Scholar 

  11. Husted JA, Tom BD, Farewell VT, Schentag CT, Gladman DD. A longitudinal study of the effect of disease activity and clinical damage on physical function over the course of psoriatic arthritis: Does the effect change over time? Arthritis Rheum. 2007;56(3):840–9.

    Article  PubMed  Google Scholar 

  12. Kipnis V, Midthune D, Buckman DW, Dodd KW, Guenther PM, Krebs-Smith SM, et al. Modeling data with excess zeros and measurement error: application to evaluating relationships between episodically consumed foods and health outcomes. Biometrics. 2009;65(4):1003–10.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Liu L, Strawderman RL, Cowen ME, Shih Y-CT. A flexible two-part random effects model for correlated medical costs. J Health Econ. 2010;29(1):110–23.

    Article  PubMed  Google Scholar 

  14. Duan N. Smearing estimate: a nonparametric retransformation method. J Am Stat Assoc. 1983;78(383):605–10.

    Article  Google Scholar 

  15. Smith VA, Preisser JS, Neelon B, Maciejewski ML. A marginalized two-part model for semicontinuous data. Stat Med. 2014;33(28):4891–903.

    Article  PubMed  Google Scholar 

  16. Rodrigues-Motta M, Galvis Soto DM, Lachos VH, Vilca F, Baltar VT, Junior EV, et al. A mixed-effect model for positive responses augmented by zeros. Stat Med. 2015;34(10):1761–78.

    Article  PubMed  Google Scholar 

  17. Hatfield LA, Boye ME, Carlin BP. Joint modeling of multiple longitudinal patient-reported outcomes and survival. J Biopharm Stat. 2011;21(5):971–91.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Su L, Tom BDM, Farewell VT. A likelihood-based two-part marginal model for longitudinal semicontinuous data. Stat Methods Med Res. 2015;24(2):194–205.

    Article  PubMed  Google Scholar 

  19. Jaffa MA, Gebregziabher M, Garrett SM, Luttrell DK, Lipson KE, Luttrell LM, et al. Analysis of longitudinal semicontinuous data using marginalized two-part model. J Transl Med. 2018;16(1):1–15.

    Article  Google Scholar 

  20. Tulupyev A, Suvorova A, Sousa J, Zelterman D. Beta prime regression with application to risky behavior frequency screening. Stat Med. 2013;32(23):4044–56.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Bourguignon M, Santos-Neto M, de Castro M. A new regression model for positive random variables with skewed and long tail. Metron. 2021;79(1):33–55.

    Article  Google Scholar 

  22. Kamyari N, Soltanian AR, Mahjub H, Moghimbeigi A. Diet, nutrition, obesity, and their implications for COVID-19 mortality: Development of a marginalized two-part model for semicontinuous data. JMIR Public Heal Surveill. 2021;7(1):e22717.

    Article  Google Scholar 

  23. Cooper NJ, Lambert PC, Abrams KR, Sutton AJ. Predicting costs over time using Bayesian Markov chain Monte Carlo methods: an application to early inflammatory polyarthritis. Health Econ. 2007;16(1):37–56.

    Article  PubMed  Google Scholar 

  24. Ghosh P, Albert PS. A Bayesian analysis for longitudinal semicontinuous data with an application to an acupuncture clinical trial. Comput Stat Data Anal. 2009;53(3):699–706.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Neelon B, O’Malley AJ, Normand ST. A Bayesian two-part latent class model for longitudinal medical expenditure data: assessing the impact of mental health and substance abuse parity. Biometrics. 2011;67(1):280–9.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Neelon BH, O’Malley AJ, Normand S-LT. A Bayesian model for repeated measures zero-inflated count data with application to outpatient psychiatric service use. Stat Model. 2010;10(4):421–39.

    Article  Google Scholar 

  27. Zhang M, Strawderman RL, Cowen ME, Wells MT. Bayesian inference for a two-part hierarchical model: An application to profiling providers in managed health care. J Am Stat Assoc. 2006;101(475):934–45.

    Article  CAS  Google Scholar 

  28. Keeping ES. Introduction to statistical inference. Princeton: D. Van Nostrand Company, Inc.; 1962.

  29. McDonald JB. Some generalized functions for the size distribution of income. In: Modeling income distributions and Lorenz curves: Springer; 2008. p. 37–55.

  30. Bourguignon M, Santos-Neto M, de Castro M. A new regression model for positive data. arXiv Prepr arXiv180407734; 2018.

    Google Scholar 

  31. Ferrari S, Cribari-Neto F. Beta regression for modelling rates and proportions. J Appl Stat. 2004;31(7):799–815.

    Article  Google Scholar 

  32. Smithson M, Verkuilen J. A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychol Methods. 2006;11(1):54.

    Article  PubMed  Google Scholar 

  33. Core Team R. R: a language and environmental for statistical computing. Vienna: R Foundation for Statistical Computing; 2017.

    Google Scholar 

  34. Stasinopoulos DM, Rigby RA. Generalized additive models for location scale and shape (GAMLSS) in R. J Stat Softw. 2008;23:1–46.

    Google Scholar 

  35. Li X, Hedeker D. A three-level mixed-effects location scale model with an application to ecological momentary assessment data. Stat Med. 2012;31(26):3192–210.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Liu L, Ma JZ, Johnson BA. A multi-level two-part random effects model, with application to an alcohol-dependence study. Stat Med. 2008;27(18):3528–39.

    Article  PubMed  Google Scholar 

  37. Rodrigues-Motta M, Forkman J. Bayesian Analysis of Nonnegative Data Using Dependency-Extended Two-Part Models. J Agric Biol Environ Stat. 2022;27(2):201–21.

    Article  Google Scholar 

  38. Davidian M, Giltinan DM. Nonlinear models for repeated measurement data: Routledge; 2017.

  39. Huang Y, Wu H. A Bayesian approach for estimating antiviral efficacy in HIV dynamic models. J Appl Stat. 2006;33(2):155–74.

    Article  Google Scholar 

  40. Sahu SK, Dey DK, Branco MD. A new class of multivariate skew distributions with applications to Bayesian regression models. Can J Stat. 2003;31(2):129–50.

    Article  Google Scholar 

  41. Ntzoufras I. Bayesian Modeling Using Winbugs. Canada: Wiley; 2009.

    Book  Google Scholar 

  42. Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. J R Stat Soc Ser B Stat Methodol. 2002;64(4):583–639.

    Article  Google Scholar 

  43. Akaike H. Information Theory and an Extension of the Maximum Likelihood Principle. In: Selected Papers of Hirotugu Akaike.; 1998. p. 199–213.

    Chapter  Google Scholar 

  44. Carlin BP, Louis TA. Bayesian methods for data analysis: CRC Press; 2008.

  45. Dey DK, Chen M-H, Chang H. Bayesian Approach for Nonlinear Random Effects Models. Biometrics. 1997;53(4):1239 Available from:

    Article  Google Scholar 

  46. Watanabe S, Opper M. Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J Mach Learn Res. 2010;11(12):1–28.

    Google Scholar 

  47. Watanabe S. A widely applicable Bayesian information criterion. J Mach Learn Res. 2013;14(27):867–97.

    Google Scholar 

  48. Gelman A, Hwang J, Vehtari A. Understanding predictive information criteria for Bayesian models. Stat Comput. 2014;24(6):997–1016.

    Article  Google Scholar 

  49. Vehtari A, Gelman A, Gabry J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput. 2017;27(5):1413–32.

    Article  Google Scholar 

  50. Gelfand AE. Model determination using sampling-based methods. Markov Chain Monte Carlo Pract. 1996;4:145–61.

    Google Scholar 

  51. Yong L. LOO and WAIC as model selection methods for polytomous items. arXiv Prepr arXiv180609996; 2018.

    Google Scholar 

  52. Gabry MJ. Package ‘loo’; 2022.

    Google Scholar 

  53. Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Stat Sci. 1992;7(4):457–72.

    Article  Google Scholar 

  54. Hatfield LA, Boye ME, Hackshaw MD, Carlin BP. Multilevel Bayesian models for survival times and longitudinal patient-reported outcomes with many zeros. J Am Stat Assoc. 2012;107(499):875–85.

    Article  CAS  Google Scholar 

  55. Bandyopadhyay D, Lachos VH, Abanto-Valle CA, Ghosh P. Linear mixed models for skew-normal/independent bivariate responses with an application to periodontal disease. Stat Med. 2010;29(25):2643–55.

    Article  PubMed  PubMed Central  Google Scholar 

  56. Figueroa-Zúñiga JI, Arellano-Valle RB, Ferrari SLP. Mixed beta regression: A Bayesian perspective. Comput Stat Data Anal. 2013;61:137–47.

    Article  Google Scholar 

Download references


We are thankful to the National Center for Health Insurance Research (NCHIR), which has provided the data for this article. Authors would like to thank the reviewers for the very helpful comments, which lead to considerable improvement of the paper.


The Abadan University of Medical Sciences, under Grant numbers 1453, supported this work.

Author information

Authors and Affiliations



NK, AS, HM, AM and MS: contributed to the conception, design, and data collection; NK and MS: contributed to the sampling, data gathering, and data assessments. NK, AS, AM, HM and MS: contributed to the statistical analysis and drafting of the manuscript; and AS: supervised the study. All authors read and approved the final version of the manuscript.

Corresponding author

Correspondence to Ali Reza Soltanian.

Ethics declarations

Ethics approval and consent to participate

The Research Ethics Committee of Abadan University of Medical Sciences approved this study with a specific code IR.ABADANUMS.REC.1401.048. Written informed consent was obtained from all subjects prior to participation in this project.

Consent for publication

Not applicable.

Competing interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

Appendix A.1. OpenBUGS code. Appendix B. Results of the simulations. Table B1. Relative bias, MSE and CP for parameter estimates with different sample sizes for the ZAG-RE and ZABP-RE models.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kamyari, N., Soltanian, A.R., Mahjub, H. et al. Zero-augmented beta-prime model for multilevel semi-continuous data: a Bayesian inference. BMC Med Res Methodol 22, 283 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Bayesian framework
  • Non-negative data
  • Two-part mixed-effects model
  • Skew distributions
  • Pharmaceutical expenditure