 Research
 Open Access
 Published:
Applications of Bayesian shrinkage prior models in clinical research with categorical responses
BMC Medical Research Methodology volume 22, Article number: 126 (2022)
Abstract
Background
Prediction and classification algorithms are commonly used in clinical research for identifying patients susceptible to clinical conditions such as diabetes, colon cancer, and Alzheimer’s disease. Developing accurate prediction and classification methods benefits personalized medicine. Building an excellent predictive model involves selecting the features that are most significantly associated with the outcome. These features can include several biological and demographic characteristics, such as genomic biomarkers and health history. Such variable selection becomes challenging when the number of potential predictors is large. Bayesian shrinkage models have emerged as popular and flexible methods of variable selection in regression settings. This work discusses variable selection with three shrinkage priors and illustrates its application to clinical data such as Pima Indians Diabetes, Colon cancer, ADNI, and OASIS Alzheimer’s realworld data.
Methods
A unified Bayesian hierarchical framework that implements and compares shrinkage priors in binary and multinomial logistic regression models is presented. The key feature is the representation of the likelihood by a PolyaGamma data augmentation, which admits a natural integration with a family of shrinkage priors, specifically focusing on Horseshoe, Dirichlet Laplace, and Double Pareto priors. Extensive simulation studies are conducted to assess the performances under different data dimensions and parameter settings. Measures of accuracy, AUC, brier score, L1 error, crossentropy, and ROC surface plots are used as evaluation criteria comparing the priors with frequentist methods as Lasso, ElasticNet, and Ridge regression.
Results
All three priors can be used for robust prediction on significant metrics, irrespective of their categorical response model choices. Simulation studies could achieve the mean prediction accuracy of 91.6% (95% CI: 88.5, 94.7) and 76.5% (95% CI: 69.3, 83.8) for logistic regression and multinomial logistic models, respectively. The model can identify significant variables for disease risk prediction and is computationally efficient.
Conclusions
The models are robust enough to conduct both variable selection and prediction because of their high shrinkage properties and applicability to a broad range of classification problems.
Introduction
Recent innovations and availability of structured electronic health records (EHRs), and multisite longitudinal studies, highthroughput sequencing (HTS) [1] data have made patient characteristics, genomics information accessible for statistical prediction analysis and clinical research. Medical decisions are modified upon the identification of essential covariates for a specific clinical outcome. Treatment recommendations are also age, population, and comorbidity specific, with HTS bringing a paradigm shift in largescale personalized medicine. For example, in diabetes prediction, various biological and clinical factors such as sex, age, obesity are responsible for predicting the future risk of the disease. In oncology, for instance, in colorectal cancer [2], breast cancer [3], nonsmall cell lung cancer(NSCLS) prognostic [4], and predictive genes such as human epidermal growth factor receptor 2 (HER2), BRAF, KRAS are not only essential for early detection but also therapy selection, subgroup stratification, and controlled monitoring of the disease. In Alzheimer’s disease (AD), a combination of traditional risk factors including age, education, hypertension, obesity, cognitive test scores, cardiovascular disease [5] along with testing of ApoE4 gene can be informative in identifying if a patient is vulnerable to the noncurable disease or any other forms of dementia.
A clinical prediction or classification model, followed by a variable selection strategy, plays a vital role in designing preventive measures from adverse outcomes.
Penalized regression
Variable selection methods can mitigate noise imposed by irrelevant variables in the data. Most of these methods are frequentist approaches such as Lasso [6], ElasticNet [7] that uses L1 or L2 penalty to shrink the coefficients of irrelevant variables. Bayesian regression methods that incorporates shrinkage properties as prior information is a reasonable alternative approach, and is the ultimate focus of this research work.
Public health researchers often encounters prediction problems with categorical responses. Depending on the clinical features the primary outcome needs to be predicted, such as the classification of a Covid19 patient as “discharged” or “died” or “remained in hospital” with individualpatient level data [8]. When the response variable is binary, logistic regression (LR) models enable assessing the association between an independent variable(s) and the response variable. When the response variable has more than two categories, generalizations of the logistic model, such as multinomial logistic regression (MLR) models, are common. The above example provides a better understanding of categorical response models and their contribution as a natural and attractive approach in a diverse array of applications, ranging from spam detection, credit card fraud to predicting Alzheimer’s stage and tumor malignancy. The categorical response models’ primary goal is to infer relationships between a categorical response and a set of explanatory variables. As in any form of regression technique, this is critical to understanding the causal dynamics of a complex system and making accurate predictions on future data. Depending on a specific problem, the explanatory variables or predictors can be of the form of demographic profiles, socioeconomic data, or complex biomolecular entities. Modeling becomes particularly challenging when the number of such predictors grows large, a common feature in genomics. Traditionally, frequentist likelihoodbased methods have been using penalized regression techniques to address this “curse of dimensionality” and to deal with data having a considerable number of predictors in linear and multicategory response regression. These methods briefly add penalty terms to the negative loglikelihood function and then optimize this penalized likelihood for parameter inference. The penalization leads to a desired “shrinking” behavior of the estimates. Specifically, it pulls the small coefficients towards zero while minimally affecting the large coefficients, thus selecting only the most relevant variables. For a LR set up, the objective function that needs to be minimized is of the form
Here, y=(y_{1},y_{2},...,y_{n}) is an n dimensional vector representing the outcome variable; x_{i}=(x_{i1},x_{i2},…,x_{ip}) are the covariates with x_{ij} are the observed values on the p predictors and (β_{1},β_{2},...,β_{p})^{T} is a pdimensional parameter vector of regression coefficients. λ is the penalty parameter which modulates the amount of shrinkage. Large values of λ lead to more shrinkage, while λ=0 leads to the simple logistic likelihood without shrinkage. The generalized objective function for multiple categories is
Different values of r lead to various penalization techniques. For example, r=1 results in the wellknown Least Absolute Shrinkage and Selection Operator (Lasso) [6] solution and r=2 results in the Ridge regression solution [9]. ElasticNet [7] is another method that is a convex combination of Lasso and Ridge.
Shrinkage priors
Bayesian variable selection models form a natural counterpart to penalization, with some specialized priors assuming the penalty terms’ role. Some notable works in this area include [6, 10,11,12,13,14,15,16,17,18] among many. A comprehensive overview of shrinkage priors for data applications is present in [19]. The historical development of shrinkage priors can be traced back to the spike and slab approach proposed in [20]. It is a hierarchical mixture model that essentially uses a latent binary indicator to differentiate between the large and small regression coefficients. It does so by assigning Gaussian priors with high and low variances to the coefficients, conditional on these indicators. The resultant marginal prior for a coefficient takes on a shape with a “spike” at 0 and a relatively flat “slab” at nonzero regions, lending the model its name. Stochastic search variable selection (SSVS) [21] is used for identifying a subset of significant covariates with a hierarchical normal mixture model structure, similar to the spike and slab prior. However, it uses continuous Gaussian distribution to include and exclude variables and lower variance, respectively. The spikeand slab prior has a biseparation effect on the model coefficients, thus bringing a computational complexity of 2^{p}. SSVS is computationally intensive and cannot tackle a large set of predictors. Inclusions of nonconjugate and conjugate priors are detailed in this extension of the work [22]. A related class of variable selection (VS) models put the positive prior mass at 0. Inference in these models usually relies on Reversible Jump sampling techniques [23]. Though these models have the nice property of explicitly reducing coefficients to zero, they typically incur high computational costs and exhibit poor mixing. These issues necessitated using a computationally efficient class of priors, which allowed sufficient shrinkage in high dimensions. The focus solely is on such priors for this article. These priors are called globallocal(GL) shrinkage priors [24]. Denoting the parameter set as β=(β_{1},...,β_{p})^{T}, the GL framework can be written hierarchically as follows
β_{j}∼N(0,λ_{j}τ) λ_{j}∼f(.) and τ∼g(.).
The global parameter τ controls the overall shrinkage of all the coefficients towards zero while the local parameters λ=(λ_{1},...,λ_{p}) modulate the coefficientspecific shrinkage. It is desired that the distribution of the local parameter f(.) has heavy tails and the distribution of the global parameter g(.) has substantial mass at point zero. The NormalGamma prior [25], the Dirichlet Laplace (DL) prior [18], Horseshoe prior [26], and the Double Pareto (DP) prior [17] are some of the GL priors. A review and comparison of shrinkage methods for VS are detailed in [27]. Highdimensional statistical modelling with Bayesian shrinkage prior method has been extended in several arenas such as longitudinal binary data with informative missingness [28], and for joint modelling of clustered mixed outcomes with uniform shrinkage prior [29]. The second piece of our proposed approach focuses on a convenient data augmentation representation of the logistic regression likelihood. Two data augmentation algorithms that have gained popularity for binary responses which are for the probit regression model that uses truncated normal random variables [30] and another for the logistic regression that uses PolyaGamma (PG) random variables [31]. Here we focus on the latter scheme.
In the following sections, the PG Data Augmentation is described under LR and MLR models. Next, the dataaugmented likelihood framework’s connection to the Horseshoe, DL, and DP priors is presented, embedding them in a fully Bayesian hierarchical model. Performance metrics evaluate the simulation results with different sample sizes, covariate dimensions, and parameter settings with several data applications. The limitations and contributions are in the Discussion section where the article is concluded with future research directions.
Bayesian binary regression and polyaGamma data augmentation
Consider the binary logistic regression model where Y_{1},Y_{2},...,Y_{n} are i.i.d. Bernoulli random variables with variables \(\left\{x_{i}\in \mathbb{R}^{p}\right\}_{i=1}^{n}\) as the covariates and \(\beta \in \mathbb {R}^{p}\) denotes the unknown regression coefficients. The corresponding likelihood is given by
In the context of the Bayesian analysis, the posterior of β given the data is given by \(\pi (\beta \mid Y) \propto \pi (\beta)\prod _{i=1}^{n} \frac {\exp (x_{i}^{T}\beta y_{i})}{1+\exp (x_{i}^{T}\beta)}\), where π(β) denotes the prior distribution for β. Unfortunately, the above posterior becomes intractable due to the presence of the term \((1+\exp (x_{i}^{T}\beta))\) in its denominator [32]. Therefore, posterior sampling in this setup traditionally relied on the Metropolis Hastings (MH) algorithm. As an alternative, the Data Augmentation (DA) algorithm utilizes the latent variables to circumvent the difficulty. Before introducing to the specific PolyaGamma DA scheme that we used in this article, we provide the general structure of a DA algorithm. Suppose, we need to sample a parameter of interest θ from an intractable density π(θ). The technique requires to design a suitable joint density π(θ,W) in such a way that it satisfies the following two criteria ; firstly, \(\int \pi (\theta, W) dW = \pi (\theta)\) and secondly, the corresponding conditional distributions, π(θ∣W) and π(W∣θ) are possible to sample from [33, 34]. In the context of the Bayesian analysis, π(θ) typically refers to a posterior density for a parameter of interest θ. A major challenge for designing a DA is to construct an appropriate choice for π(θ,W) (see [30], [31]). A commonly used strategy is to build a conditional distribution π(W∣θ) so that π(θ,W)=π(W∣θ)π(θ) fulfills the requirements. The latent variable W is often termed as the augmented random variable. In the current context, our parameter of interest is β with the posterior density π(β∣Y) while we make use of the DA technique designed in [31] where the authors develops and utilizes the PolyaGamma (PG) distribution as the choice for the augmented random variable. As we will require in the later sections, we include the probability density function of the PolyaGamma distribution (denoted by PG(1,c)) [31], as following
To deal with the intractability in Eq. 3, the DA scheme augments independent latent variables, \(\{W_{i}\}_{i=1}^{n}\) where \(W_{i} \sim PG(1,x_{i}^{T}\beta )\). \(\{W_{i}\}_{i=1}^{n}\) are also assumed to be independent of \(\{Y_{i}\}_{i=1}^{n}\) for i=1,…n. Then, the joint posterior \(\pi (\beta, \{W_{i}\}_{i=1}^{n}\mid \{Y_{i}\}_{i=1}^{n})\) satisfies the two criteria related to a generic DA algorithm that are mentioned above. The integrability criterion trivially holds whereas the the random variables W_{1},…,W_{n} given \(\{Y_{i}\}_{i=1}^{n}, \beta\) are independent and follows PG distribution [31]. The posterior conditional for \(\beta \mid \{Y_{i}\}_{i=1}^{n}, \{W_{i}\}_{i=1}^{n}\) becomes to be the multivariate normal if the multivariate normal prior is used for β [31]. The appearance of normal distribution as a posterior for β is a key feature which provides a way to utilize more nontrivial priors for β without much difficulty. Specifically, in this manuscript we use the GlobalLocal priors that we discuss in details in the next section. As mentioned earlier, Bayesian regression for binary responses has been recognized as a hard problem due to the likelihood’s unwieldy form. There have been several efforts towards an improved version of the DA algorithm [30] for probit regression. Some notable works include [35] and [36]. However, these algorithms are more imprecise versions of [30] making it significantly difficult with multiple layers of latent variables and restricting its usage. In contrast, the DA algorithm by [31] is free of these problems and computationally much less cumbersome.
Logistic regression model with hierarchical prior structures
In this subsection, we include the details of each of the three priors distributions along with the corresponding posterior distributions. The original form of the horseshoe prior [37] is represented as
Another computationally feasible hierarchical representation of the Horseshoe prior [14] that is used here is
Here, the variance covariance matrix Σ of the distribution of β is a diagonal matrix with elements (Λ_{1}τ^{2},…Λ_{p}τ^{2}). From Eqs. 3,(4) and the above hierarchical prior structure (6), the full posterior distribution is given by:
where, h(w_{i}) is defined in Eq. 4. The conditional distributions required for the analysis are as follows:
The conditional density of β given y, w is
where, W_{D} and Σ are diagonal matrices where the elements are (w_{1},w_{2},...,w_{n}), \((\Lambda _{1}^{2}\tau ^{2},...,\Lambda _{p}^{2}\tau ^{2})\) respectively and, \(y^{*}=\left (y_{1}\frac {1}{2},....,y_{n}\frac {1}{2}\right)\).
In highdimensional scenarios where sampling of β can be difficult for p>n case, fast sampling technique with Gaussian scalemixture priors [38] is used where the mean and variance of a Gaussian distribution is in the respective form: \(N_{p}(\mu,\hat {\Sigma }), \mu =\hat {\Sigma } A^{T}\alpha, \hat {\Sigma }=(A^{T}A+D^{1})^{1}\) D∈R^{p×p},A∈R^{n×p},α∈R^{n×1} The algorithm involves sampling u∼N(0,D), and δ∼N(0,I_{n}); with V=Au+δ getting the inverse of (ADA^{T}+I_{n})w=(α−v), and finally obtaining θ=u+DA^{T}w,θ∼N(μ,Σ). The conditional density of w_{i} given x_{i},β is
The conditional density of the hyperparameters are as follows
Here, all the posterior densities are in the closed form, and follow simple probability densities such as Normal, PolyaGamma and InverseGamma making sampling from them trivial. The hierarchical structure of the Dirichlet Laplace prior [18] is
The conditional posterior distributions remain same for β∣y_{i} and w_{i}∣β is similar to that of Eqs. 8 and 9.
The conditional density of the hyperparameters as obtained similar to Theorem 2.2 in [18] are as follows:
To sample π(ϕ∣β_{j}) sample T_{j}∼GIG(a−1,1,2β_{j}), set \(\phi _{j} =\frac {T_{j}}{T}, T=\sum _{j=1}^{p}T_{j}\). where GIG(a,b,c) is the Generalized Inverse Gaussian distribution with density \(f(x; a, b,c) \propto x^{(c1)} e^{\frac {1}{2}(ax+\frac {b}{x})}\).
The hierarchical structure of Double Pareto prior[30] is
Again, the conditional densities of β∣y_{i} and w_{i}∣β remains same as (8) and (9). Here Σ=D_{τ} is a diagonal matrix with elements (τ_{1},τ_{2},...,τ_{p}). The conditional density of rest of the hyperparameters are as follows:
Extending hierarchical models and differential shrinkage
The strength of our methods are in no way limited to a common shrinkage prior across covariates. In fact, this could be applied to several different variant models which allow for borrowing strength across multiple responses and different (in some cases, userdefined) levels of sparsity among groups of covariates. In particular, this could be used for models where we can choose not to shrink some covariates (for instance, some demographic covariates) in contrast to the genomic covariates.
In terms of estimation procedures, the hierarchical structure of the models would allow the form of the posterior conditional of β would remain the same in the model variants. Only the diagonal weight matrices appearing in the posterior means and variances would have some differential allocation of λs and constant variances depending on which variable that we choose to shrink. Similarly, the posterior sampling of the shrinkage hyperparameters would be a subset. Details of the derivation are omitted since these are straight forward.
Bayesian multinomial logistic regression and polyaGamma data augmentation
This is the extension of the binary regression, where the outcome variable has more than two classes. Let Y_{i}, i=1,2,…,n be a categorical random variable with k categories, where k≥2. The probability for the k^{th} category is p_{ik}=P(Y_{i}=k∣x_{i}), where \(\sum _{k=1}^{K} p_{ik} = 1\). The multinomial logistic regression (MLR) model is given as
Here, β_{k} are the coefficients associated with kth category, and K is the baseline category with its coefficient β_{K} constrained to zero. The PolyaGamma data augmentation approach can be extended for multicategory response variables. Here, β_{k} is updated conditional on the remaining β_{−k}=(β_{1},β_{2},…,β_{k−1},β_{k+1},…,β_{K}). The full conditional for β_{k} given y and β_{j≠k} can be expressed as a likelihood of the Bernoulli distribution.
where f(β_{k}) is the prior for coefficient β_{k},I(Y_{i}=k)=1 when Y_{i}=k with probability \(p_{ik} = P(Y_{i}=k)= \frac {e^{x_{i}^{T}\beta _{k}}}{\sum _{j=1}^{K} e^{x_{i}^{T}\beta _{k}}}\). It can be rewritten as \(p_{ik} = P(Y_{i}=k) = \frac {e^{x_{i}^{T}\beta _{k}M_{ik}}}{1+ e^{x_{i}^{T}\beta _{k}M_{ik}}} = \frac {e^{\psi _{ik}}}{1+e^{\psi _{ik}}}\) where, \(M_{ik}= \log \sum _{j \neq k}e^{x_{i}^{T}\beta _{j}}\) and \(\psi _{ik} = x_{i}^{T}\beta _{k}M_{ik}\). \(\sum _{j \neq k}e^{x_{i}^{T}\beta _{j}}\) includes the reference category K, as β_{K}=0, so \(e^{x_{i}^{T}\beta _{k}} =1\), and hence \(M_{ik} = \log \sum _{j \neq k}e^{x_{i}^{T}\beta _{j}} = \log \left (1+\sum _{j \neq {k,K}}e^{x_{i}^{T}\beta _{j}}\right)\). Thus, the full conditional for β_{k} given y and β_{−k} is
which is a logistic regression likelihood, thus PG data augmentation can be implemented to update each of β_{k}’s based on the binary indicator variable I(y_{i}=k). The above Eq. 17 after including PG data augmentation and prior for β_{k}∼N(0,Σ_{k}) can be expressed as:
Multinomial logistic regression with hierarchical prior structures
The hierarchical structure with three priors was calculated similarly to LR. Considering \(N(0,\Lambda _{t}^{2}\tau ^{2}), t = 1,2,\ldots,p\), for β_{1},β_{2},…,β_{K−1}, the conditional density for β_{k} and w_{ik} are given as
where \(\Sigma _{0} =diag\left (\Lambda _{t}^{2}\tau ^{2}\right), W_{k} = diag(w_{ik}), y_{k}^{*} = \frac {I(y_{i}=k)0.5}{w_{ik}}+M_{ik}\).
The conditional density of the hyperparameters in the case of horseshoe prior is
The conditional density of the hyperparameters in case of DL prior is
To sample π(ϕ∣β_{k}) sample T_{tk}∼GIG(a−1,1,2β_{tk}), set \(\phi _{tk} =\frac {T_{tk}}{T}, T=\sum _{t=1}^{p}T_{tk}\). where GIG(a,b,c) is the Generalized Inverse Gaussian distribution with density \(f(x; a, b,c) \propto x^{(c1)} e^{\frac {1}{2}(ax+\frac {b}{x})}\).
The conditional densities of the hyperparameters are as follows:
Here, β_{k}=(β_{1k},β_{2k},…,β_{pk}),t=1,2,…,p.
Simulation and results
Data was simulated from the logistic regression model under various settings of sample size (n), dimensions (p), and effect sizes β=(β_{1},β_{2},…,β_{p}). The continuous covariates are simulated with mean m and sd=1. A total of 100 data sets were simulated for each of the conditions. All computations are carried out in RStudio [39]. The length of the Markov Chain Monte Carlo (MCMC) simulation is 10000, and 6000 iterations are discarded in the burnin step. The value of parameter a in DL prior is chosen as 0.8 for optimal results and convergence.
Postprocessing of mCMC samples
After model fitting variable selection was implemented by computing their posterior credible intervals. Posterior credible intervals can be readily obtained from a Bayesian framework via MCMC samples, and can provide direct uncertainty measures irrespective of model complexity. Specifically, these intervals are constructed by the quantiles of the MCMC samples of the parameters and estimate the probability intervals of the true posterior distribution. The variables are declared to be significant if and only if these posterior credible interval do not include zero. Thus, it will be a binary vector of the variables that were selected from estimation. These binary estimates were then matched against the simulation truth to compute some performance metrics.
Other than performance measures, these credible interval techniques would also provide a convenient measure of uncertainty, in case we need to use them. The varying sizes and noise content of datasets gets reflected in the varying lengths of the estimated prediction intervals that in turn allows us to place varying degrees of confidence in our inferred relationships. This could also inform us about the desired adequacy of sample size and number of variables for future studies. This can be readily obtained from MCMC samples unlike other traditional methods such as Lasso.
Performance measures
Based on the binary estimates, sensitivity, specificity and overall accuracy were computed for variable selection. In addition, two continuous error measures, MSE and Brier scores,were computed. The Brier Score (BS) is defined as \(BS= \frac {1}{N}\sum _{i=1}^{N} \left (P_{i} Y_{i}\right)\); here, P_{i} is the probability of prediction and Y_{i} is the actual outcome at that instance. The best score achievable is 0 and the worst is 1; The Mean Squared Error (MSE) is given by \({\frac {1}{p}\sum _{i=1}^{p}(\beta _{i}\hat {\beta _{i}})^{2}}\);
Prediction
A prediction module was also added that would assess how the model fares in classifying the responses. For this, a crossvalidation approach and divided each simulated data set into training and test data on a 8020 allocation percentage. The following measures were used for assessing the quality of our prediction. Accuracy (the proportion of correctly classified responses in the test dataset); Sensitivity (proportion of true positives); Specificity (proportion of true negatives); AUC (Area Under the Receiver Operating Characteristic (ROC) Curve).
Simulation scenarios and tables
Table 1 summarizes the simulation results and simulation settings for prediction and VS and Table 2 records the same for MLR.
Each measure has three rows corresponding to the three priors. Figures 1 and 2 represents the VS and prediction performance across the simulation scenarios for LR and MLR, respectively. The two simulation scenarios (BS8 and BS9) reflect the extension discussed in the earlier section– namely, the differential shrinkage option. In BS8, some demographic variables (gender, age, weight, height) as well as their interactions were included with the remaining covariates. The coefficients for the demographic variables and their interactions were not penalized, i.e. given nonshrinkage priors. Another simulation scenario (BS9) was explored that included the same four demographic variables along with four binary variables ∼Ber(1,0.2). These proxied SNP variables (sn1,sn2,sn3,sn4)–that arise in genomics applications. Interactions between (sn3,weight), and (sn3,age) were considered. Shrinkage was applied to all the main effects of SNPs and the interactions.
The primary takeaway from the results in the tables is that performance metrics reach increasingly higher levels with higher N/P ratio and higher magnitude of the coefficients. In Table 1, for LR, a large part of these numerical measures lie in the range of 0.80.9 and the trend also continues in the simulation settings with differential shrinkage and interaction coefficients (BS8 and BS9). Most of the nonzero interaction coefficients in these settings were correctly identified. Higher correlation among the variables, as in BS5, and higher percentage of zero β^{′}s (BS6) does drop the performance to some extent. Overall, the proposed method does markedly better on specificity than sensitivity. Even in the p>n scenario, most of our performance metrics stayed above 0.8 for both prediction and variable selection. In Table 2, for MLR, larger sample size, as before drives better performance. Both large N and large P (MS7), although performing well in variable selection, has the worst predictive performance of all. We intend to explore such settings in more depth in future work with novel shrinkage methods specially geared towards such settings. Overall, performance scores in the range of 0.850.9 or above dominate the results in variable selection. The predictive performance, in contrast, is moderate compared to the counterpart in Table 1 for LR, though the prediction accuracies remain above 0.8 in 4 out of the 7 simulation settings. It is seen that a larger number of classes, as in MS6 (5 classes), push down the predictive measures to the range of 0.70.8. Lastly, the performances are very similar across all three priors for variable selection as well as prediction, so the method is robust to prior choices.
Data application
The method is validated by applying it to standard data examples. For all case studies, MCMC burnin size is 6000, and no. of MCMC iterations is 10000. The training and test set size is of ratio 80/20. All data descriptions are in Table 3.
Pima indians diabetes
The data has been analyzed in the literature related to shrinkage priors, including [44,45,46]. The three priors can detect four variables by 95% credible intervals. The credible intervals of the coefficients with variable names “pregnant,” “glucose,” “mass (BMI),” and “pedigree (family history)” do not include zero; hence they are significant. These results are at par with [44]; the “pressure” variable is also determined as significant by DP prior. These three priors are also compared with BayesianLasso(BLasso) [44], BayesianElasticNet(BElastic) [47] and frequentist methods such as Lasso, ElasticNet and Ridge. Here, the three priors’ accuracy is similar, but BS is the least among all methods. Even though EN and Ridge have high accuracy, the frequentist methods have substantially high BS and low specificity values.
Colon
This data comprises of gene expression levels for 40 tumors and 22 normal colon tissues measured for 6500 human genes using the Affymetrix oligonucleotide arrays [48]. Samples are obtained from tumor tissue and adjacent unaffected parts of the colon of the same patients [49]. Out of these, 2000 genes with the highest minimal intensity across the tissues were selected for classification purposes [48]. In the public domain, there is no demographic information available for the two groups such as race, age, sex distribution; else, a study could have been conducted about the benefits of inclusion of one or all these covariates in prediction. Lasso selects 12 genes. The priors are applied to the data with the lasso estimates as the initial values. Here, an approach of kmeans similar to [18] with prostate cancer data set is adopted. The β_{i} are clustered by Kmeans algorithm at each MCMC iteration into two clusters. For each iteration, the number of nonzero β’s is then estimated by the smaller cluster size out of the two clusters. A final estimate (F) of the number of nonzero signals is obtained by taking the highest frequency over all the MCMC iterations. The Fnumber of gene IDs are traced back for each iteration and the first F genes with the highest frequency are chosen. In the case of Horseshoe prior, it selects 36 genes with the highest frequencies, among which one gene (gene ID: 1423) is included among the top 20 genes selected by the ttest and fold change [50]. Also, the gene ID: 1325 selected by Lasso [51] can be seen in the set of 36 genes selected by Horseshoe. DL prior selects 164 genes, of which gene ID: 138 is included in the top 2 genes. DP selects 67 genes, out of which eight genes are in the set of genes selected by Horseshoe.The MCMC Gibbs sampling mixing and convergence were determined by trace plots and autocorrelation plots with an average adequate sample size of 23095.21 for Horseshoe, 39755.4 for DL, and 40129.35 for DP. About 95% of the genes conform to the Geweke diagnostic criteria for all three priors. A thinning step at every 15 steps of iteration did not significantly improve the results, so a posterior prediction analysis was performed on 40,000 iterations after burnin. The MSE for HS, DL, and DP are 0.001,0.003,0.003, implying that MCMC has converged well. Here, on the basis of Brier score Horseshoe prior performs better than the other priors.
ADNI
Alzheimer’s disease (AD) is a critical public health concern throughout the world and one of the most widespread neurodegenerative disorder [52]. AD is an irreparable brain disease, which impairs thinking and memory. Several machine learning methods have helped predict AD from Mild Cognitive Impairment (MCI); here, the shrinkage priors are used to predict AD from MCI, and they achieve high prediction measures.
This dataset is obtained from Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a publicprivate partnership, led by Principal Investigator Michael W. Weiner, MD. The main mission was to test if the combination of serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be used to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD).
The data used here is obtained from the R package ADNIMERGE [42] which is a subset of the “adnimerge” data that contains only baseline variable measurements (i.e., the first visit for each participant) that has a diagnosis. The “adnimerge” datas et merges several key variables from various case report forms and biomarker lab from all ADNI protocols (ADNI1, ADNIGO, and ADNI2). The integrated data consists of 113 variables and 14712 observations, which include multiple observations per participant, representing multiple visits to the study’s site for participant evaluation The participants is divided into five different classes, namely, Cognitively Normal (CN =4428), Early Mild Cognitive Impairment (EMCI =2687), Late Mild Cognitive Impairment (LMCI =4993), Subjective Memory Complaint (SMC =938), and Alzheimer’s Disease (AD =1654). After preprocessing, there were 911 observations, and 23 variables including the outcome variable. The demographic, cognitive assessment and clinical assessment attributes that are included in the analysis are described in [53]. The missing values are discarded, and the data is normalized. The three priors, along with Bayesian lasso and Bayesian EN, are used to predict AD from MCI that comprised of both early and late stages. The three priors achieve 85% accuracy and 93.5% specificity for the prediction of AD from the MCI stage and have comparable results with other shrinkage priors. The results also indicate that the most distinguishing attribute for the prediction of AD includes the ADAS13.bl (13 item AD Cognitive Scale score), RAVLT.perc.forgetting.bl (Rey’s Auditory Verbal Learning Test (scores for immediate response learning, forgetting and percentage forgetting)) by all three priors. FAQ.bl (Functional Activities Questionnaire) cognitive test was selected only by DP prior.
OASIS
This data set consists of a longitudinal collection of Nondemented and Demented Older Adults, 150 subjects aged 60 to 96. The three priors’ most frequently included variables are Age, SES, MMSE, gender, ASF, and nWBV. The three priors’ prediction efficiency shows that they outperform other methods concerning BS and specificity. Table 4, and the circular bar chart Fig. 3 gives a detailed view of the prediction performance of LR model. Figure 3 is efficient in describing the comparison of all the four data sets in a single platform. For each data set, eight bars correspond to the eight shrinkage methods, and each stacked bar consists of the five measures of prediction. For ADNI and PIMA, all methods perform similarly, though the priors do not prove to be efficient in the highdimensional Colon data. For the MLR model, in ADNI data, the three classes are considered as EMCI, LMCI, and AD, and collected data for these classes had a dimension of 1438×113. A 10 fold crossvalidation was used for prediction analysis. The variables selected by the priors are “CDRSB.bl,” “MMSE.bl,” “WholeBrain.bl”, “Fusiform.bl”, and “ICV,” which are reallife markers for AD prediction and are validated by previous literature. The fivecategory model did not have a high performance in terms of accuracy. DL and DP provide better metrics than Horseshoe here. For OASIS data, the categorical response variable is classified into three classes. The moderate (1) and mild CDR (2) ratings are combined since there were only four observations for the moderate class. The variables selected are Age, Education, Socio Economic Status (SES), MiniMental State Exam (MMSE), and normalized Whole Brain Volume (nWBV)”. Accuracy and AUC are presented in parentheses for the three priors in Table 4. Figure 4 is prepared by the R packages HUM [54], and plotly [55] that provides details about the behavior of the shrinkage priors. The ROC surface for horseshoe prior is presented here; all the other priors follow similarly.
The convergence metrics, trace plots, and acf plots for all data sets to assess the sampling procedure are presented in the supplementary file. The Geweke diagnostic (Z) is calculated that compares the mean of the samples drawn from the end of a chain of MCMC output to the mean of the samples at the beginning of the chain using a ztest statistic. A cutoff of 2 is arbitrarily chosen for the absolute value of Z. The percentage of Z<=2 is reported for each of the three priors.
Overall, it is seen that on two out of four datasets, our proposed method with the horseshoe prior outperformed the frequentist methods on specificity by as much as 10 percentage points. On the other two datasets, while we are behind on the specificity, we compare much favorably on sensitivity. The colon dataset has the second highest AUC from our method only superseded by Blasso. Overall, our method is clearly competitive and shows superiority in at least one of the measures over the frequentist methods in all datasets.
Discussion
Bayesian shrinkage models can be utilized as a practical and useful alternative classification approach and a plausible way to select genetic markers and risk factors. This is a detailed study on a wide variety of settings comparing the three most wellknown shrinkage priors. All of the three priors were able to recognize patterns differentiating the binary classes to a highly accurate level. This was shown by the excellent predictive performance of the binary LR model on the simulated data sets and fairly high predictive accuracies on a wide array of public health data sets. Some of these datasets were high dimensional, which exhibits the model’s power to scale up to these challenging scenarios. The algorithm is efficient, and the time taken to execute the simulations is relatively low, such as algorithm with n=500;p=50 takes about 673 seconds. This combination of computational power and predictive performance makes it a very reasonable method of use for practitioners who require quick high dimensional analysis, retaining the advantages of Bayesian analysis. In future we would formally explore the comparison with priors such as Spike and Slab priors and schemes ase the MH algorithm. We posit that features such as selections of indicators and acceptancerejection step would not favorably compare with the proposed algorithm. We also extended the model to a multinomial logistic model that handles multiple categories. An R package “ShrinkageBayesGlm” is developed to be publicly available soon. We expect that the coming years will witness its wider dissemination among public health research and will be useful for predicting occurrences of common disorders as dementia, colon cancer, diabetes, etc. Computational advances, especially in highdimensional cases [56] will continue expanding the scope of exact methods.
Availability of data and materials
The datasets used and/or analysed during the current study are available and information about it is included in this article in Table 3.
Abbreviations
 SSVS:

Stochastic Search Variable Selection
 GL:

globallocal
 HS:

Horseshoe
 DL:

Dirichlet Laplace
 DP:

Double Pareto
 PG:

PolyaGamma
 EN:

Elastic Net
 BE:

Bayesian Elastic Net
 BL:

Bayesian Lasso
 LR:

Logistic Regression
 MLR:

Multinomial Logistic Regression
 DA:

Data Augmentation
 MCMC:

Markov Chain Monte Carlo
 AUC:

Area Under Curve
 ROC:

Receiver Operating Characteristic Curve
 MSE:

Mean Squared Error
 BS:

Brier Score
 VS:

variable selection
References
Reuter JA, Spacek DV, Snyder MP. Highthroughput sequencing technologies. Mol Cell. 2015; 58(4):586–97.
Newton K, Newman W, Hill J. Review of biomarkers in colorectal cancer. Color Dis. 2012; 14(1):3–17.
Krishnamurti U, Silverman JF. Her2 in breast cancer: a review and update. Adv Anat Pathol. 2014; 21(2):100–07.
Cappuzzo F, Gregorc V, Rossi E, Cancellieri A, Magrini E, Paties CT, Ceresoli G, Lombardo L, Bartolini S, Calandri C, et al. Gefitinib in pretreated non–smallcell lung cancer (nsclc): Analysis of efficacy and correlation with her2 and epidermal growth factor receptor expression in locally advanced or metastatic nsclc. J Clin Oncol. 2003; 21(14):2658–63.
Barnes DE, Lee SJ. Predicting alzheimer’s risk: why and how?Alzheimers Res Ther. 2011; 3(6):33.
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol. 1996; 58(1):267–88.
Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol. 2005; 67(2):301–20.
Li Y, Horowitz MA, Liu J, Chew A, Lan H, Liu Q, Sha D, Yang C. Individuallevel fatality prediction of covid19 patients using ai methods. Front Public Health. 2020; 8:566.
Hoerl AE, Kennard RW. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 1970; 12(1):55–67.
Bae K, Mallick BK. Gene selection using a twolevel hierarchical bayesian model. Bioinformatics. 2004; 20(18):3423–30.
Bhadra A, Datta J, Polson NG, Willard B, et al. The horseshoe+ estimator of ultrasparse signals. Bayesian Anal. 2017; 12(4):1105–31.
Griffin J, Brown P, et al. Hierarchical shrinkage priors for regression models. Bayesian Anal. 2017; 12(1):135–59.
Ishwaran H, Rao JS, et al. Spike and slab variable selection: Frequentist and Bayesian strategies. Ann Stat. 2005; 33(2):730–73.
Makalic E, Schmidt DF. A simple sampler for the horseshoe estimator. IEEE Sig Process Lett. 2015; 23(1):179–82.
Park T, Casella G. The Bayesian Lasso. J Am Stat Assoc. 2008; 103(482):681–86.
Piironen J, Vehtari A. On the hyperprior choice for the global shrinkage parameter in the horseshoe prior. In: Artificial Intelligence and Statistics; 2017. p. 905–13. https://proceedings.mlr.press/v54/piironen17a.html.
Armagan A, Dunson DB, Lee J. Generalized Double Pareto shrinkage. Stat Sin. 2013; 23(1):119.
Bhattacharya A, Pati D, Pillai NS, Dunson DB. Dirichlet–Laplace priors for optimal shrinkage. J Am Stat Assoc. 2015; 110(512):1479–90.
Van Erp S, Oberski DL, Mulder J. Shrinkage priors for Bayesian penalized regression. J Am Stat Assoc. 2019; 89:31–50.
Mitchell TJ, Beauchamp JJ. Bayesian variable selection in linear regression. J Am Stat Assoc. 1988; 83(404):1023–32.
George EI, McCulloch RE. Variable selection via Gibbs sampling. J Am Stat Assoc. 1993; 88(423):881–89.
George EI, McCulloch RE. Approaches for Bayesian variable selection. Stat Sin. 1997:339–373. https://www.jstor.org/stable/24306083.
Green PJ, Hastie DI. Reversible jump mcmc. Genetics. 2009; 155(3):1391–403.
Polson NG, Scott JG. Shrink globally, act locally: Sparse Bayesian regularization and prediction. Bayesian Stat. 2010; 9:501–38.
Griffin JE, Brown PJ, et al. Inference with normalgamma prior distributions in regression problems. Bayesian Anal. 2010; 5(1):171–88.
Carvalho CM, Polson NG, Scott JG. The horseshoe estimator for sparse signals. Biometrika. 2010; 97(2):465–80.
O’Hara RB, Sillanpää MJ, et al. A review of bayesian variable selection methods: what, how and which. Bayesian Anal. 2009; 4(1):85–117.
Wang C, Daniels M, Scharfstein DO, Land S. A Bayesian shrinkage model for incomplete longitudinal binary data with application to the breast cancer prevention trial. J Am Stat Assoc. 2010; 105(492):1333–46.
Chen H. C., Wehrly TE. Approximate uniform shrinkage prior for a multivariate generalized linear mixed model. J Multivar Anal. 2016; 145:148–61.
Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc. 1993; 88(422):669–79.
Polson NG, Scott JG, Windle J. Bayesian inference for logistic models using pólya–gamma latent variables. J Am Stat Assoc. 2013; 108(504):1339–49.
Choi HM, Hobert JP, et al. The PolyaGamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic. Electron J Stat. 2013; 7:2054–64.
Van Dyk DA, Meng XL. The art of data augmentation. J Comput Graph Stat. 2001; 10(1):1–50.
Hobert JP, Marchev D, et al. A theoretical comparison of the data augmentation, marginal augmentation and pxda algorithms. Ann Stat. 2008; 36(2):532–54.
Holmes CC, Held L, et al. Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal. 2006; 1(1):145–68.
FrühwirthSchnatter S, Frühwirth R. Data augmentation and mcmc for binary and multinomial logit models. In: Statistical Modelling and Regression Structures. Springer: 2010. p. 111–32. https://link.springer.com/chapter/10.1007/9783790824131_7.
Carvalho CM, Polson NG, Scott JG. Handling sparsity via the horseshoe. Proc Twelth Int Conf Artif Intell Stat. 2009; 5:73–80.
Bhattacharya A, Chakraborty A, Mallick BK. Fast sampling with Gaussian scale mixture priors in highdimensional regression. R software. 2016:042.
R Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2020. R Foundation for Statistical Computing. https://www.Rproject.org/.
Leisch F, Dimitriadou E. mlbench: Machine Learning Benchmark Problems. R package version 2.13. 2021. https://cran.rproject.org/web/packages/mlbench/citation.html.
Silva PD. Hidimda: An r package for supervised classification of highdimensional data. In: 1ères Rencontres R; 2012. https://cran.rproject.org/web/packages/HiDimDA/index.html.
the ADNI team. ADNIMERGE: Alzheimer’s Disease Neuroimaging Initiative. 2020. R package version 0.0.1. https://adni.bitbucket.io/.
Marcus DS, Fotenos AF, Csernansky JG, Morris JC, Buckner RL. Open access series of imaging studies: longitudinal mri data in nondemented and demented older adults. J Cogn Neurosci. 2010; 22(12):2677–84.
Makalic E, Schmidt D. HighDimensional Bayesian Regularised Regression with the Bayesreg Package. arXiv:1611.06649. https://cran.rproject.org/web/packages/bayesreg/index.html.
Bové DS, Held L, et al. Hyper g priors for generalized linear models. Bayesian Anal. 2011; 6(3):387–410.
Ghosh J, Li Y, Mitra R, et al. On the use of Cauchy prior distributions for Bayesian logistic regression. Bayesian Anal. 2018; 13(2):359–83.
Huang A, Liu D. Ebglmnet: a comprehensive R package for sparse generalized linear regression models. Bioinformatics. 2016.
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci. 1999; 96(12):6745–50.
Sakellariou A, Sanoudou D, Spyrou G. Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data. BMC Bioinformatics. 2012; 13(1):270.
Pepe MS, Longton G, Anderson GL, Schummer M. Selecting differentially expressed genes from microarray experiments. Biometrics. 2003; 59(1):133–42. Wiley Online Library. https://onlinelibrary.wiley.com/doi/full/10.1111/15410420.00016?casa_token=ZKlEkVvNcIAAAAA%3AQ6w0BuXZkDeRvRtYffgFNrmCQNPikaRmTmK2hyHwLVAgbYBx6IWfZvKYJ6GA1qxuaequogs2kDlJL4.
Algamal Z. An efficient gene selection method for highdimensional microarray data based on sparse logistic regression. Electron J Appl Stat Anal. 2017; 10(1):242–56. http://sibaese.unisalento.it/index.php/ejasa/article/view/16346.
Bäckman L, Jones S, Berger AK, Laukka EJ, Small BJ. Cognitive impairment in preclinical alzheimer’s disease: a metaanalysis. Neuropsychology. 2005; 19(4):520.
Shahbaz M, Ali S, Guergachi A, Niazi A, Umer A. Classification of alzheimer’s disease using machine learning techniques. In: DATA; 2019. p. 296–303.
Novoselova N, Della Beffa C, Wang J, Li J, Pessler F, Klawonn F. Hum calculator and hum package for r: easytouse software tools for multicategory receiver operating characteristic analysis. Bioinformatics. 2014; 30(11):1635–36.
Sievert C. Interactive WebBased Data Visualization with R, Plotly, and Shiny: Chapman and Hall/CRC; 2020. https://plotlyr.com.
Lee SY, Pati D, Mallick BK. Continuous shrinkage prior revisited: a collapsing behavior and remedy. arXiv preprint arXiv:2007.02192. 2020.
Acknowledgements
Authors would like to acknowledge Dr. Craig McClain and Dr. S. Srivastava for funding to conduct the research.
ADNI data
Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH1220012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer's Association; Alzheimer's Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; BristolMyers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. HoffmannLa Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (http://www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer's Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
*Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative(ADNI) database ({adni.loni.usc.edu}). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wpcontent/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf
OASIS data
Authors would like to acknowledge the following for the OASIS data
• OASIS: CrossSectional: Principal Investigators: D. Marcus, R, Buckner, J, Csernansky J. Morris; P50 AG05681, P01 AG03991, P01 AG026276, R01 AG021910, P20 MH071616, U24 RR021382
• OASIS: Longitudinal: Principal Investigators: D. Marcus, R, Buckner, J. Csernansky, J. Morris; P50 AG05681, P01 AG03991, P01 AG026276, R01 AG021910, P20 MH071616, U24 RR021382
• OASIS3: Principal Investigators: T. Benzinger, D. Marcus, J. Morris; NIH P50AG00561, P30NS09857781, P01AG026276, P01AG003991, R01AG043434, UL1TR000448, R01EB009352. AV45 doses were provided by Avid Radiopharmaceuticals, a wholly owned subsidiary of Eli Lilly.
Funding
Dr. S. Rai was partly supported with Wendell Cherry Chair in Clinical Trial Research Fund, multiple National Institutes of Health (NIH) grants (5P20GM113226, PI: McClain; 1P42ES023716, PI: Srivastava; 5P30GM12760702, PI: Jones; 1P20GM12550401, PI: Lamont; 2U54HL120163, PI: Bhatnagar/Robertson; 1P20GM135004, PI: Yan; 1R35ES023837301, PI: Cave; 1R01ES029846, PI: Bhatnagar; 1R01ES02777801A1, PI: States; 1P30ES030283, PI: Sates), and Kentucky Council on Postsecondary Education grant (PON2 415 1900002934, PI: Chesney).
Author information
Affiliations
Contributions
A.B. has contributed to the methodology, data collection, analysis, and writing of the manuscript. S.P. and R.M. have contributed equally to methodology and analysis. S.R. has contributed to developing ideas and valuable comments. All authors have contributed to the final preparation of the manuscript.
Corresponding author
Ethics declarations
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1
The additional tables and figures are presented in the Supplementary file.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Bhattacharyya, A., Pal, S., Mitra, R. et al. Applications of Bayesian shrinkage prior models in clinical research with categorical responses. BMC Med Res Methodol 22, 126 (2022). https://doi.org/10.1186/s12874022015606
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874022015606
Keywords
 Shrinkage priors
 Logistic regression
 Horseshoe
 Dirichlet Laplace
 MCMC
 PolyaGamma
 Multinomial
 ADNI
 Pima
 Data augmentation