Bayesian model selection techniques as decision support for shaping a statistical analysis plan of a clinical trial: An example from a vertigo phase III study with longitudinal count data as primary endpoint

Background A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. Methods We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over- and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform (PIT), and by using proper scoring rules (e.g. the logarithmic score). Results The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. Conclusions The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint.


Introduction
The software R-INLA (more precisely: the program inla bundled within an R interface) was created to provide a user-friendly tool for performing approximate Bayesian inference on a range of latent Gaussian models using integrated nested Laplace approximations (INLAs) [1,2,3]. Latent Gaussian models include, e.g., (generalized) linear (mixed) models, (generalized) additive models, smoothing spline models, state space models, semiparametric regression, spatial and spatio-temporal models, or geostatistical models. The INLA computational approach combines Laplace approximations and numerical integration in a very efficient matter (using sparse matrix algorithms), without resorting to MCMC techniques. INLA substitutes MCMC simulations with very accurate deterministic approximations of the posterior marginal distributions. In addition to its computational speed, latent Gaussian models are treated in an unified way, and therefore, allowing greater automation of the inference process. For a detailed description of the INLA methodology as well as a thorough comparison with MCMC results please refer to [1].
The output of inla consists of approximate posterior marginal distributions, which can be used to compute summary statistics of interest, such as posterior means, variances or quantiles. Furthermore, the DIC, PIT and CPO values, or the logarithmic scores can be obtained to compare and assess complex Bayesian hierarchical models. The Website r-inla.org includes a short documentation describing the class of hierarchical models which can be solved within the R-INLA library, and for each model a detailed description and an example of usage is provided. The latest release of R-INLA to implement the INLA approach can also be found at this site. Currently, the R-library implements many exponential family models, e.g., Gaussian, Poisson, Binomial, negative binomial, zero-inflated extensions, etc. Dependence can be modelled using, e.g., random effects, first order auto-regressive, first and second order random walks, and much more.

Web Appendix: Implementing GLMMs in INLA
In the following it is demonstrated how generalized linear mixed effects models used in the Results Section (Simulation Study) are coded in R-INLA. In this supplement we focus on random intercept models and provide a description of selected R-code required for the implementation of the INLA approach. In R-INLA, a random intercept model is constructed in 2 steps: Specification of the latent Gaussian field through the formula mechanism, by using the f() function to define the type of latent Gaussian field, e.g., random intercepts denoted as 'independent random noise model' (model=''iid'')

inla()-call
Within the inla()-call many further options and additional features for the INLA algorithm can be set. Particularly, the flags cpo=TRUE or dic=TRUE tell the inla() function to compute "leave-one-out" predictive measures, namely CPO and PIT values, or the DIC. For computation of CPO and PIT quantities, it is recommended to increase the accuracy of the tails of the marginals, by modifying the control.inla argument: For instance, we chose the full Laplace approximation (strategy=''Laplace''), added more evaluation points (npoints=...) or changed the integration strategy (int.strategy=''grid''). The default choice is the simplified Laplace approximation and the so-called central composite design (CCD) integration scheme [1,3].
The dataframe to be used is specified using the data argument within the inla()-call. The prior for the hyperparameter (i.e. the precision parameter) of a single random effect is specified inside the f() function (using prior=... and param=... statements to change default hyperprior distributions and corresponding hyperprior parameters). If a prior for the hyperparameter of an observational model has to be specified, e.g., for the dispersion parameter k in case of a negative binomial model or for the precision in case of a Normal response model, the prior has to be assigned inside the control.data argument.
Beware that INLA is still a project under active development -hence, parts of the following R-INLA code may have changed in the meanwhile. For the analyses within this paper we used the R-INLA library built in October 2011.
Simulation scenario for longitudinal negative binomial counts: Example: number of patients per group: n = 20, overdispersion parameter k = 1.