Skip to main content

A Bayesian Bernoulli-Exponential joint model for binary longitudinal outcomes and informative time with applications to bladder cancer recurrence data

Abstract

Background

A variety of methods exist for the analysis of longitudinal data, many of which are characterized with the assumption of fixed visit time points for study individuals. This, however is not always a tenable assumption. Phenomenon that alter subject visit patterns such as adverse events due to investigative treatment administered, travel or any other emergencies may result in unbalanced data and varying individual visit time points. Visit times can be considered informative, because subsequent or current subject outcomes can change or be adapted due to previous subject outcomes.

Methods

In this paper, a Bayesian Bernoulli-Exponential model for analyzing joint binary outcomes and exponentially distributed informative visit times is developed. Via statistical simulations, the influence of controlled variations in visit patterns, prior and sample size schemes on model performance is assessed. As an application example, the proposed model is applied to a Bladder Cancer Recurrence data.

Results and conclusions

Results from the simulation analysis indicated that the Bayesian Bernoulli-Exponential joint model converged in stationarity, and performed relatively better for small to medium sample size scenarios with less varying time sequences regardless of the choice of prior. In larger samples, the model performed better for less varying time sequences. This model’s application to the bladder cancer data showed a statistically significant effect of prior tumor recurrence on the probability of subsequent recurrences.

Peer Review reports

Introduction

Longitudinal data entail observations collected repeatedly on subjects over time. In medical research, the collection of correlated, longitudinal data is a common phenomenon. Ranging from the assessment of response changes and trends over time to understanding disease progression, the benefits longitudinal approaches are enormous [1, 2]. A defining feature of longitudinal data is the dependency that characterizes observations extending over time, the type of outcome measured and sometimes, the assumption of fixed time measurements for subjects [3,4,5]. The broad assumption of fixed time measurements, predetermined by study design, however is not always a tenable assumption. For instance, in a clinical trial, there is the potential for different visit mechanisms. Study subjects are likely to miss scheduled visits, and a proportion of them are prone to adverse events from investigative treatments. Also, due to poor health conditions, individuals may self elect to visit the investigative site or hospital more intensely than their study counterparts. These occurrences may result not just in unbalanced data for subjects, but also varying visit profiles. Thus, the time structure adopted for the study can be considered informative. In a broad sense, this indicates that outcomes measured at subsequent time points are influenced or can be adapted based on outcomes measured in current time. This necessitates the use of advanced methods that address the informative time structure rather than standard, traditional approaches, which are limited by the assumption of fixed time. To handle such scenarios, Bronsert [6] developed a classical joint model, involving Gaussian outcomes and exponentially distributed informative time. Later, Alomair [7] extended Bronsert’s model to include time dependent covariates. Classical informative time joint models have also been developed by Seo [8], involving longitudinal outcomes from the exponential families and exponentially distributed informative time. These joint models used the maximum likelihood estimation approach for estimating model parameters, and the authors broadly discussed associated computational complexities.

A Bayesian technique for modeling joint longitudinal outcomes and informative time points has been developed by Zaagan [9] but only for Gaussian distributed outcomes. The objectives of this research paper are twofold. First, we develop a Bayesian joint model for analyzing binary longitudinal outcomes and informative times. Then, via statistical simulations, we examine the influence of controlled variations in subject visit patterns, different prior specifications and sample size schemes on the proposed model. This proceeds with model convergence assessment and model evaluation. The proposed Bayesian-Exponential joint model is applied to a Bladder cancer recurrence data resulting from a clinical trial involving patients with bladder cancer conducted by the Veterans Administration Co-operative Urological Research Group (VACURG) [10, 11].

Data and methods

The Bayesian Bernoulli-Exponential joint model formulation and likelihood specification

The exponential family of distributions covers a broad range of response distributions including Gaussian and Non-Gaussian distributions [12, 13]. For example, the Normal, Gamma, Poisson, Bernoulli, and Beta distributions are a part of the parametric set of distributions included in the family. Suppose the observations \(y_{1},y_{2},y_{3},\cdots , y_{n}\) are independent observations of a response variable, the exponential family of distributions from which the independent observations are sampled, can be specified as

$$\begin{aligned} f\left( y_{i} \mid \theta _{i}, \phi \right) =\exp \left\{ m_{i}^{-1}(\phi )\left( y_{i} \theta _{i}-s\left( \theta _{i}\right) \right) +r\left( y_{i}, \phi \right) \right\} ~~~i=1,\ldots , n. \end{aligned}$$
(1)

Where,

  • \(\theta _{i}\) represents the canonical parameter.

  • \(\phi\) is a scale parameter and \(m_{i}(\cdot ), s(\cdot )\), and \(r(\cdot )\) are known functions which relates to the variances of distributions in the exponential family.

  • \(m_{i}(\phi )\) can be specified as \(m_{i}(\phi )=\frac{\phi }{u_{i}}\), and \(u_{i}\) ’s are predetermined weights.

The canonical or location parameter characterizes a so called canonical link function, and relates to the means of the distributions in the exponential family.

Assume we have a set of n participants enrolled in a clinical trial, have to visit an investigative site over time and are followed over an interval from \((0,\tau ]\). A response observation for the ith participant measured at the kth visit time point can be specified as \(y_{ik}\). We can further specify vectors of individual responses and their associated visit time points as

$$\begin{aligned} \varvec{y_{i}} = \left( \begin{array}{c} y_{i 1}\\ y_{i2}\\ y_{i3}\\ \vdots \\ y_{i n_{i}} \end{array}\right) , \qquad \qquad \quad \varvec{t_{i}} = \left( \begin{array}{c} t_{i 1}\\ t_{i 2}\\ t_{i 3}\\ \vdots \\ t_{i n_{i}}\\ \end{array}\right) . \end{aligned}$$

Here, the subscript \(n_{i}\) allows for varying participant visit times. We can thus specify the joint distribution of recorded responses and time points as

$$\begin{aligned} f_{\Theta }\left( \varvec{y}_{i} , \varvec{t}_{i}\right) =f_{\Theta }\left( \varvec{y}_{i} \mid \varvec{t}_{i}\right) \cdot f_{\Theta }\left( \varvec{t}_{i}\right) , \end{aligned}$$
(2)

where \(\Theta\) is a vector of unknown parameters to be estimated. Using these ideas, and in line with Seo [8] we can further specify a model that incorporates the joint distribution of responses and time points \(\varvec{y}_{ik}\) and \(\varvec{t}_{i_{n}}\) with the underlying assumption that the current response depends on the one-step prior response \(\left( y_{ik-1}\right)\), and current visit time point \(\left( t_{i k}\right)\). It is important to note, however, that subsequent responses, \(y_{i k}\) will not be solely conditioned on observation time, \(t_{i k}\) but also on the most recent prior response, \(y_{ik-1}\) and observation time. This distribution can be specified as;

$$\begin{aligned} f_{\Theta }\left( \varvec{y}_{i}, \varvec{t}_{i}\right) =f_{\Theta }\left( y_{i 1} \mid t_{i 1}\right) \times f_{\Theta }\left( t_{i 1}\right) \times \prod _{k=2}^{n_{i}} f_{\Theta }\left( y_{i k} \mid t_{ik},t_{ik-1}, y_{ik-1}\right) \times f_{\Theta }\left( t_{ik} \mid y_{i k-1}\right) . \end{aligned}$$
(3)

This formulation forms the premise for specifying the joint model with response observations sampled from the Bernoulli distribution. Time is considered informative and assumed to be exponentially distributed. The joint distribution for binary longitudinal outcomes and informative time given the underlying assumption of a one step dependency can be specified as;

$$\begin{aligned} f_{\Theta }\left( \varvec{y}_{i}, \varvec{t}_{i}\right) = f\left( y_{ik} , y_{i k-1}, t_{i k},t_{ik-1}, \varvec{X}\right)= & {} \mu _{i k}^{y_{i k}}\left( 1-\mu _{i k}\right) ^{1-y_{i k}} \nonumber \\= & {} \exp \left( y_{i k} \log \left( \frac{\mu _{i k}}{1-\mu _{i k}}\right) +\log \left( 1-\mu _{i k}\right) \right) . \end{aligned}$$
(4)

Note that, \(\mu _{ik}=E\left( Y_{ik}\right) =P\left( Y_{ik}=1\right)\).

More specifically for the Bernoulli distribution the link function can be specified as a logit link

$$\begin{aligned} \theta _{ik}=\log \left( \frac{\mu _{ik}}{1-\mu _{i k}}\right) =\text {logit}\left( \mu _{ik}\right) . \end{aligned}$$
(5)

which in the context of this study can be expressed as;

$$\begin{aligned} \log \left( \frac{\mu _{i j}}{1-\mu _{ik}}\right) =\varvec{X}_i^{\prime } \varvec{\alpha }+\vartheta t_{ik}+\psi y_{ik-1} . \end{aligned}$$
(6)

Furthermore, the specified mean function for the initial value for the ith participant and that after the initial value can be expressed as

$$\begin{aligned} \mu _{i 1}=\frac{\exp \left( \varvec{X}_i^{\prime } \varvec{\alpha }\right) }{1+\exp \left( \varvec{X}_i^{\prime } \varvec{\alpha }\right) } \quad \text{ and } \quad \mu _{ik}=\frac{\exp \left( \varvec{X}_i^{\prime } \varvec{\alpha }+\vartheta t_{ik}+\psi y_{ik-1}\right) }{1+\exp \left( \varvec{X}_i^{\prime } \varvec{\alpha }+\vartheta t_{ik}+\psi y_{ik-1}\right) .} \end{aligned}$$

respectively. Hence, our final model specification for the parametric joint Bernoulli-Exponential model can be expressed as;

$$\begin{aligned} f_{\Theta }\left( \varvec{y}_{i}, \varvec{t}_{i}\right)&= f\left( y_{i k} \mid y_{ik-1}, t_{ik},t_{ik-1}, \varvec{X}\right) = \exp \left\{ y_{i1} \ln \left( \frac{\mu _{i1}}{1-\mu _{i1}}\right) +\ln (1-\mu _{i1})\right\} \nonumber \\&\quad \times \prod _{k=2}^{n_{i}}\Bigg \{\exp \left( y_{ik} \ln \left( \frac{\mu _{ik}}{1-\mu _{ik}}\right) +\ln (1-\mu _{ik})\right) \nonumber \\&\quad \times \exp \left( \xi +\gamma y_{ik-1}\right) \times \exp \left( -\exp \left( \xi +\gamma y_{ik-1}\right) t_{ik}\right) \Bigg \}, ~~~~~ t_{ik} \in (0, \tau ]. \end{aligned}$$
(7)

Where,

  • \(\varvec{\alpha }\) is a vector of regression parameters denoting the effect of covariates on observed responses.

  • \(\psi\) represents the effect of the prior responses on average current responses.

  • \(\vartheta\) represents the effect of current response time on the mean responses,

  • \(\xi\) is a constant parameter associated with time

  • \(\gamma\) characterizes the effect of previous response on mean time and \(\varvec{X}\) is the design matrix.

The resulting likelihood function, a product of the density functions for s subjects, can be specified as,

$$\begin{aligned} L\big (\Theta , y_{1},y_{2},y_{3} \cdots , y_{s}\big )&=\prod _{i=1}^{s}\Bigg \{\exp \bigg \{y_{i1} \ln \left( \frac{\mu _{i1}}{1-\mu _{i1}}\right) +\ln (1-\mu _{i1})\bigg \}\nonumber \\&\quad \times \prod _{k=2}^{n_{i}} \exp \Big \{y_{ik} \ln \left( \frac{\mu _{ik}}{1-\mu _{ik}}\right) +\ln (1-\mu _{ik})\nonumber \\&\quad \times \exp \left( \xi +\gamma y_{ik-1}\right) \times \exp \left( -\exp \left( \xi +\gamma y_{ik-1}\right) t_{ik}\right) \Big \}\Bigg \}. \end{aligned}$$
(8)

It is further important to clarify, that one key underlying assumption of this model, following Lin and Ying [14],Lin, Scharfstein, and Rosenheck [15], Liang, Wenbin and Zhiliang [16] and Sun, Sun, and Liu [17], is that censoring time, \(Z_i\) in this study is noninformative in the sense that given covariates \(\left( X_i\right) , Z_i\) is independent of the observation times \(\left\{ t_{i k}\right.\), \(k \ge 1\}\) and longitudinal outcomes \(Y_i(\cdot )\). This basically means that given the covariate history up to time k, the distribution of the future covariate path up to any time \(t>k\) is independent of whether or not there is an observation on \(X_i\) at time k.

Specification of priors

After the likelihood function of the Bernoulli-Exponential joint model distribution has been specified, the next step in the Bayesian model specification is the identification of a suitable prior. In this study, informative and non-informative priors are considered. Both priors serve important roles in Bayesian analysis, and the choice between them depends on the specific goals and available information in a given analysis [18]. Non-informative priors, also known as weak,vague or diffuse priors, are designed to have minimal influence on the posterior distribution. They can make Bayesian analysis robust to situations where there is little prior information or when prior beliefs are uncertain. They prevent strong prior assumptions from biasing results when there is limited prior knowledge [19]. One of the primary benefits of informative priors, on the other hand, is that they allow to incorporate expert domain knowledge and prior information into the analysis [20, 21]. This is invaluable when experts have insights that can improve parameter estimation, and, in situations with limited or noisy data, informative priors can lead to more stable and accurate parameter estimates. Finally, informative priors explicitly quantify prior beliefs and uncertainty, which allows to integrate these beliefs with observed data. In this study, for both informative and non-informative prior scenarios, we consider the vector of mean parameters \((\varvec{\alpha })\) as having a multivariate normal distribution [19, 22,23,24]. This is specified as;

$$\begin{aligned} p(\varvec{\alpha } \mid \phi )&\sim N\left( \varvec{\mu }_{\alpha }, \phi \varvec{\Sigma }_{\alpha }\right) \nonumber \\ p(\varvec{\alpha }\mid \phi )&=\bigg (2 \pi \bigg )^{\frac{s}{2}}\bigg |\phi \varvec{\Sigma }_{\alpha }\bigg |^{-\frac{1}{2}} \exp \left( -\frac{1}{2}\left( \varvec{\alpha }-\varvec{\mu }_{\alpha }\right) ^{\prime }\left( \phi \varvec{\Sigma }_{\alpha }\right) ^{-1}\left( \varvec{\alpha }-\varvec{\mu }_{\alpha }\right) \right) . \end{aligned}$$
(9)

Furthermore, we consider the parameters associated with time or visit to similarly follow a Gaussian distribution;

$$\begin{aligned} \gamma&\sim N\left( \mu _{\gamma }, \nu _{\gamma }^{2}\right) \\ \vartheta&\sim N\left( \mu _{\vartheta }, \nu _{\vartheta }^{2}\right) \\ \psi&\sim N\left( \mu _{\psi }, \nu _{\psi }^{2}\right) \\ \xi&\sim N\left( \mu _{\xi }, \nu _{\xi }^{2}\right) .\\ \end{aligned}$$

Note that the prior distributions of our joint model parameters are considered independent and thus,

$$\begin{aligned} p(\varvec{\alpha }, \vartheta ,\psi ,\xi )&=p(\varvec{\alpha }|\phi )\times p(\vartheta )\times p(\psi )\times p(\xi )\nonumber \\&=\bigg (2 \pi \bigg )^{\frac{s}{2}}\bigg |\phi \varvec{\Sigma }_{\alpha }\bigg |^{-\frac{1}{2}} \exp \left( -\frac{1}{2}\left( \varvec{\alpha }-\varvec{\mu }_{\alpha }\right) ^{\prime }\left( \phi \varvec{\Sigma }_{\alpha }\right) ^{-1}\left( \varvec{\alpha }-\varvec{\mu }_{\alpha }\right) \right) \nonumber \\&\quad\times \frac{1}{\sqrt{2 \pi \nu _{\vartheta }^{2}}} \exp \left( -\frac{1}{2}\left( \vartheta -\mu _{\vartheta }\right) ^{2}\right) \times \frac{1}{\sqrt{2 \pi \nu _{\psi }^{2}}} \exp \left( -\frac{1}{2}\left( \psi -\mu _{\psi }\right) ^{2}\right) \nonumber \\&\quad\times \frac{1}{\sqrt{2 \pi \nu _{\xi }^{2}}} \exp \left( -\frac{1}{2}\left( \xi -\mu _{\xi }\right) ^{2}\right) . \end{aligned}$$
(10)

For the informative prior setting, fixed values for the prior means, \((\varvec{\mu }_{\alpha },{\mu }_{\vartheta },{\mu }_{\psi },{\mu }_{\xi },{\mu }_{\omega })\) and their corresponding variances \((\varvec{\Sigma }_{\alpha },\nu _{\vartheta },\nu _{\psi },\nu _{\xi },\nu _{\omega })\) are adopted, since we do not have expert or historical estimates yet for these kind of studies. More specifically, we can denote the mean vector of \(\varvec{\alpha }\), \(\varvec{\mu }_{\alpha }\) with a prior mean vector and corresponding covariance matrix as;

$$\begin{aligned} p(\varvec{\alpha } \mid \phi )\sim N\left( 0.6 \varvec{\text{I}}_{s}, 5\varvec{\text{I}}_{s}\right) . \end{aligned}$$

where \(\text{I}_{s}\) represents an identity matrix whose dimension depends on s individuals and \(\phi\). More broadly, we set predetermined prior mean values for the visit parameters as;

$$\begin{aligned} ({\mu }_{\vartheta },{\mu }_{\psi },{\mu }_{\xi },{\mu }_{\omega })=(0.2,0.3,0.3,1), \end{aligned}$$

and their corresponding prior variances as

$$\begin{aligned} ({\nu }_{\vartheta }^{2},{\nu }_{\psi }^{2},{\nu }_{\xi }^{2},{\nu }_{\omega }^{2})=(0.3,0.4,0.2,1.5). \end{aligned}$$

Regarding the non-informative prior setting, two approaches are considered. First, Gaussian non-informative priors are adopted for all mean and variance parameters of both the response and time parameters. More broadly, to express prior ignorance, the prior means \(({\mu }_{\alpha },{\mu }_{\vartheta },{\mu }_{\psi },{\mu }_{\xi },{\mu }_{\omega })\) are set to zero and the variance-covariance for \(\phi \varvec{\Sigma }_{\alpha }\) can be set as a diagonal matrix with large variance. Similarly the corresponding prior variances for the other parameters are set very large to express prior ignorance. Thus, the non-informative priors are set up as,

$$\begin{aligned} p(\varvec{\alpha })&\sim N\left( \varvec{\text{0}}_{s}, 10^{8}\varvec{\text{I}}_{s}\right) \\ p({\gamma })&\sim N\left( 0, 10^{8}\right) \\ p(\vartheta )&\sim N\left( 0, 10^{8}\right) \\ p(\psi )&\sim N\left( 0, 10^{8}\right) \\ p(\xi )&\sim N\left( 0, 10^{8}\right) \\ p(\omega )&\sim N\left( 0, 10^{8}\right) . \end{aligned}$$

For the second case of non-informative prior, we consider the Jeffreys prior [25] an appealing reference prior widely used in Bayesian inference. This prior is considered for the response/outcome parameters and Gaussian non-informative priors are still considered in this study for visit parameters. The Jeffreys prior is obtained by applying the Jeffreys rule which defines the prior density to be directly proportional to the square root of the determinant of the Fisher information matrix. That is, for a set of parameters \(\varvec{\theta }=\left( \theta _{1}, \ldots , \theta _{n}\right)\), the Jeffreys prior is given by,

$$\begin{aligned} p(\varvec{\theta }) \propto \bigg ({\text {det}( I(\varvec{\theta }))}\bigg )^{\frac{1}{2}}. \end{aligned}$$

The Fisher information matrix is defined by,

$$\begin{aligned} I(\varvec{\theta })=-E\left[ \frac{\partial ^{2} \ln L}{\partial \theta _{i} \partial \theta _{k}}\right] . \end{aligned}$$
(11)

and L is the likelihood function that specifies the probability for data y given the parameters \(\theta\). It is appropriate so far as \(\textbf{I}(\theta )\) is positive definite. Aside its geometric interpretation, one of the appealing reasons for its usage is the concept of parameterization invariance [26]. This means that the prior is invariant with regards to one-to-one transformations. The principle can be extended for multidimensional parameters. To establish ideas for the Jeffreys prior for response parameters, which result from the exponential family of distributions, the likelihood functions of the distributions and associated score vectors need to be specified.

Let \(\phi _{i}\)’s be known and \(\varvec{X}^{\prime }\) assume a rank q. Also let, \(\theta _{i}=z\left( \varvec{x}_{i}^{\prime } \varvec{\alpha }\right)\) and \(m^{-1}\left( \phi _{i}\right) =\phi ^{-1}w\). The likelihood function for Generalized linear models with responses from the exponential family of distributions can generally be specified as;

$$\begin{aligned} L(\varvec{\alpha }) \propto \exp \left[ \sum _{i=1}^{n} m^{-1}\left( \phi _{i}\right) \left\{ y_{i} z\left( \varvec{x}_{i}^{\prime } \varvec{\alpha }\right) -s\left( z\left( \varvec{x}_{i}^{\prime } \varvec{\alpha }\right) \right) \right\} \right] . \end{aligned}$$
(12)

The score vector is represented by;

$$\begin{aligned} \frac{\partial \log L(\varvec{\alpha })}{\partial \varvec{\alpha }}=\sum _{i=1}^{n} m_{i}^{-1}\left( \phi _{i}\right) \left\{ y_{i}-s^{\prime }\left( z\left( \varvec{x}_{i}^{\prime } \varvec{\alpha }\right) \right) \right\} z^{\prime }\left( \varvec{x}_{i}^{\prime } \varvec{\alpha }\right) \varvec{x}_{i}. \end{aligned}$$
(13)

The resulting Fisher information matrix is specified as;

$$\begin{aligned} \varvec{I(\alpha )}=E\left[ -\frac{\partial ^{2} \log L}{\partial \varvec{\alpha } \partial \varvec{\alpha }^{\prime }}\right] =\varvec{X}^{\prime } \varvec{P} \varvec{V}(\varvec{\alpha }) \Delta ^{2}(\varvec{\alpha }) \varvec{X}. \end{aligned}$$
(14)

Here,

  • \(\varvec{P}=\text {Diag}\left( m^{-1}\left( \phi _{i}\right) ,\cdots , m^{-1}\left( \phi _{n}\right) \right)\) which is an \(n \times n\) diagonal matrix of the weights \(w_{i}\).

  • \(\varvec{V}(\varvec{\alpha })=\text {Diag}\left( s^{\prime \prime }\left( \left( \varvec{x}_{1}^{\prime } \varvec{\alpha }\right) \right) , \cdots , s^{\prime \prime }\left( \left( \varvec{x}_{n}^{\prime } \varvec{\alpha }\right) \right) \right)\) which reflects an \(n \times n\) diagonal matrix of \(v_{i}=\) \(\frac{\partial ^{2} s\left( \theta _{i}\right) }{\partial \theta _{i}^{2}}\).

  • \(\varvec{\Delta }(\varvec{\alpha })=\text {Diag}\left( s^{\prime }\left( \varvec{x}_{1}^{T} \varvec{\alpha }\right) , \cdots , s^{\prime }\left( \varvec{x}_{n}^{T} \varvec{\alpha }\right) \right)\) is an a \(n \times n\) diagonal matrix of \(\delta _{i}=\frac{\partial s\left( \theta _{i}\right) }{\partial \eta _{i}}\) and is an adjustment for the link function.

The Jeffreys prior thus for \(\varvec{\alpha }\) assuming \(\phi\) is known, is specified as

$$\begin{aligned} p(\varvec{\alpha }) \propto \left| \varvec{X}^{\prime } \varvec{P} \varvec{V}(\varvec{\alpha }) \Delta ^{2}(\varvec{\alpha }) \varvec{X}\right| ^\frac{1}{2}. \end{aligned}$$
(15)

Based on this derivation, Jeffreys non-informative prior considered for response parameters and Gaussian non-informative priors maintained for the visit parameters can be specified as;

$$\begin{aligned} p(\varvec{\alpha }, \vartheta ,\psi ,\xi )&=p(\varvec{\alpha }|\phi )\times p(\vartheta )\times p(\psi )\times p(\xi )\times p(\omega )\nonumber \\&=\left| \varvec{X}^{\prime } \varvec{P} \varvec{V}(\varvec{\alpha }) \Delta ^{2}(\varvec{\alpha }) \varvec{X}\right| ^\frac{1}{2}\nonumber \\&\quad \times \frac{1}{\sqrt{2 \pi \nu _{\vartheta }^{2}}} \exp \left( -\frac{1}{2}\left( \vartheta -\mu _{\vartheta }\right) ^{2}\right) \times \frac{1}{\sqrt{2 \pi \nu _{\psi }^{2}}} \exp \left( -\frac{1}{2}\left( \psi -\mu _{\psi }\right) ^{2}\right) \nonumber \\&\quad \times \frac{1}{\sqrt{2 \pi \nu _{\xi }^{2}}} \exp \left( -\frac{1}{2}\left( \xi -\mu _{\xi }\right) ^{2}\right) . \end{aligned}$$
(16)

Posterior distribution specification and Bayesian joint parameter estimation

The next step in the Bayesian model development is the specification of the posterior distribution, which has a directly proportional relationship with the model likelihood and the priors specified. For the scenario where Gaussian priors are considered for both the response and visit parameters and also for both informative and non informative settings, the resulting Bayesian Bernoulli-Exponential joint model posterior specification can be obtained as;

$$\begin{aligned}{}&p(\varvec{\alpha }, \vartheta ,\psi ,\xi ,\phi |\varvec{Y}_{i},\varvec{t}_{i},\varvec{X} )=L\left( \Theta , y_{1},y_{2},y_{3} \cdots , y_{s}\right) \times p(\varvec{\alpha }|\phi )\times p(\vartheta )\times p(\psi )\times p(\xi )\nonumber \\&=\prod _{i=1}^{s}\Bigg \{\exp \bigg \{y_{i1} \ln \left( \frac{\mu _{i1}}{1-\mu _{i1}}\right) +\ln (1-\mu _{i1})\bigg \}\times \prod _{k=2}^{n_{i}} \exp \Big \{y_{ik} \ln \left( \frac{\mu _{ik}}{1-\mu _{ik}}\right) \nonumber \\&\quad +\ln (1-\mu _{ik})\times \exp \left( \xi +\gamma y_{ik-1}\right) \times \exp \left( -\exp \left( \xi +\gamma y_{ik-1}\right) t_{ik}\right) \Big \}\Bigg \}\nonumber \\&\quad \times \bigg (2 \pi \bigg )^{\frac{s}{2}}\bigg |\phi \varvec{\Sigma }_{\alpha }\bigg |^{-\frac{1}{2}} \exp \left( -\frac{1}{2}\left( \varvec{\alpha }-\varvec{\mu }_{\alpha }\right) ^{\prime }\left( \phi \varvec{\Sigma }_{\alpha }\right) ^{-1}\left( \varvec{\alpha }-\varvec{\mu }_{\alpha }\right) \right) \times \frac{1}{\sqrt{2 \pi \nu _{\vartheta }^{2}}} \exp \left( -\frac{1}{2}\left( \vartheta -\mu _{\vartheta }\right) ^{2}\right) \nonumber \\&\quad \times \frac{1}{\sqrt{2 \pi \nu _{\psi }^{2}}} \exp \left( -\frac{1}{2}\left( \psi -\mu _{\psi }\right) ^{2}\right) \times \frac{1}{\sqrt{2 \pi \nu _{\xi }^{2}}} \exp \left( -\frac{1}{2}\left( \xi -\mu _{\xi }\right) ^{2}\right) . \end{aligned}$$
(17)

Also for the scenario where Jeffreys priors are considered for the parameters of the Bernoulli response and Gaussian priors for the visit parameters (non informative settings), the resulting Bayesian Bernoulli-Exponential joint model can be parameterized as;

$$\begin{aligned} p(\varvec{\alpha }, \vartheta ,\psi ,\xi ,\phi |\varvec{Y}_{i},\varvec{t}_{i},\varvec{X} )=L\left( \Theta , y_{1},y_{2},y_{3} \cdots , y_{s}\right) \times p(\varvec{\alpha }|\phi )\times p(\vartheta )\times p(\psi )\times p(\xi ). \end{aligned}$$
(18)
$$\begin{aligned}{}&=\prod _{i=1}^{s}\Bigg \{\exp \bigg \{y_{i1} \ln \left( \frac{\mu _{i1}}{1-\mu _{i1}}\right) +\ln (1-\mu _{i1})\bigg \}\times \prod _{k=2}^{n_{i}} \exp \Big \{y_{ik} \ln \left( \frac{\mu _{ik}}{1-\mu _{ik}}\right) \\&\quad +\ln (1-\mu _{ik})\times \exp \left( \xi +\gamma y_{ik-1}\right) \times \exp \left( -\exp \left( \xi +\gamma y_{ik-1}\right) t_{ik}\right) \Big \}\Bigg \}.\\&\quad \times \left| \varvec{X}^{\prime } \varvec{P} \varvec{V}(\varvec{\alpha }) \Delta ^{2}(\varvec{\alpha }) \varvec{X}\right| ^\frac{1}{2}\times \frac{1}{\sqrt{2 \pi \nu _{\vartheta }^{2}}} \exp \left( -\frac{1}{2}\left( \vartheta -\mu _{\vartheta }\right) ^{2}\right) \\&\quad \times \frac{1}{\sqrt{2 \pi \nu _{\psi }^{2}}} \exp \left( -\frac{1}{2}\left( \psi -\mu _{\psi }\right) ^{2}\right) \times \frac{1}{\sqrt{2 \pi \nu _{\xi }^{2}}} \exp \left( -\frac{1}{2}\left( \xi -\mu _{\xi }\right) ^{2}\right) . \end{aligned}$$

Here, \(\varvec{V}(\varvec{\alpha })=\text {diag}\left( v_{1},v_{2} \ldots , v_{n}\right)\) and \(v_{i}=\mu _{ik}(1-\mu _{ik})\). Note that,

$$\begin{aligned} \mu _{ik}=\frac{\exp \left( \varvec{\alpha }^{\prime }\varvec{X}_{i} + \vartheta t_{ik}+\psi y_{i k-1}\right) }{1+\exp \left( \varvec{\alpha }^{\prime }\varvec{X}_{i} + \vartheta t_{ik}+\psi y_{i k-1}\right) }. \end{aligned}$$

The next goal is to obtain posterior summary estimates for inference. Analytical calculations of the posterior distributions are possible, but often untenable due to laborious calculations involving the integration constant. Integral approximation methods can be adopted but only if few parameters are involved [19, 24]. In situations such as this study involving many parameters to be estimated, one can resort to Markov Chain Monte Carlo Methods (MCMC) [27]. The MCMC methods are viable simulation approaches for sampling from posterior distributions and computing posterior summary measures. They are premised on a Markov Chain construction that subsequently converges to a so-called target distribution. The two most popular MCMC methods are the Gibbs sampling and the Metropolis-Hastings algorithm [27,28,29]. In this study, we adopt the Gibbs sampling procedure for generating samples from the joint posterior distributions of the unknown parameters in our model. It is important to clarify, however, that the Gibbs sampler, performs iterative draws from posterior conditional distributions instead of directly sampling from the joint posterior distribution. This approach enhances the utility of the Gibbs Sampler, especially when dealing with complex joint posteriors that can be challenging to handle directly. Then, posterior summaries can be computed. In each step of the algorithm, random values are generated from unidimensional distributions [30]. A brief summary of the Gibbs sampling algorithm is as follows;

  1. (a)

    Predetermined initial values \(\varvec{\theta }^{(0)}\) need to be specified.

  2. (b)

    For \(t=1, \ldots , T\) iterations,

    1. (i)

      Set \(\varvec{\theta }=\varvec{\theta }^{(t-1)}\).

    2. (ii)

      For \(k=1, \ldots , r\), we can update \(\theta _{k}\) from \(\theta _{k} \sim p\left( \theta _{k} \mid \theta _{1}, \ldots , \theta _{k-1}, \theta _{k+1}, \ldots , \theta _{r}\right)\).

Now, if the current state of the chain \(\theta\) is \(\theta ^{(t)}=\left( \theta _{1}^{(t)} \ldots , \theta _{r}\right)\), then we can generate the new parameters by,

$$\begin{aligned} \begin{array}{ccc} \text {Drawing}\ \ \theta _{1}^{(t)} &{} \text {from} &{} p\left( \theta _{1} \mid \theta _{2}^{(t-1)}, \theta _{3}^{(t-1)}, \ldots , \theta _{q}^{(t-1)}, \varvec{y}\right) , \\ \text {Drawing}\ \ \theta _{2}^{(t)} &{} \text {from} &{} p\left( \theta _{2} \mid \theta _{1}^{(t)}, \theta _{3}^{(t-1)}, \ldots , \theta _{q}^{(t-1)}, \varvec{y}\right) , \\ \text {Drawing}\ \ \theta _{3}^{(t)} &{} \text {from} &{} p\left( \theta _{3} \mid \theta _{1}^{(t)}, \theta _{2}^{(t)}, \theta _{4}^{(t-1)}, \ldots , \theta _{q}^{(t-1)}, \varvec{y}\right) ,\\ {} &{} \vdots &{} {}\\ \text {Drawing}\qquad \theta _{q}^{(t)} &{} \text {from} &{} p\left( \theta _{q} \mid \theta _{1}^{(t)}, \theta _{2}^{(t)}, \ldots , \theta _{q-1}^{(t)}, \varvec{y}\right) . \end{array} \end{aligned}$$

The distributions, \(p\left( \theta _{k} \mid \theta _{1}^{(t)}, \theta _{2}^{(t)}, \ldots , \theta _{k-1}^{(t)}, \theta _{k+1}^{(t-1)}, \ldots , \theta _{q}^{(t-1)}, \varvec{y}\right)\) are known as the full, complete or conditional distributions. Summarily, the Gibbs sampling algorithm helps to iteratively generate samples from our posterior distribution based on prespecified starting values. Initial portions of the Markov chains are discarded in an attempt to mask the influence of initial values. This is called the burn-in part. Resulting posterior summary measures such as the posterior mean, posterior standard deviation and Bayesian credible intervals are obtained from the MCMC output. Furthermore, we assess convergence of the Markov chains via the diagnosis of ergodic mean plots of estimated parameters and the Heidelberger and Welch diagnostic test which is a more formal convergence diagnostic method [31].

Model evaluation

To assess the Bayesian Bernoulli-Exponential joint model, the Bayesian model evaluation criteria called the Deviance Information Criterion (DIC) is used [32]. The DIC measure comprises a “goodness of fit” and “complexity” term and is obtained as;

$$\begin{aligned} \text{DIC}&=-2 \ln L [\varvec{y} \mid \text{E}(\varvec{\theta } \mid \varvec{y})]+2 p_{\text{D}},\\&=\hat{D}(\varvec{\theta })+2 p_{\text{D}}. \end{aligned}$$

where \(\hat{D}(\varvec{\theta })\) is the deviance calculated at the posterior mean of the parameters and \(p_{\text{D}}\) characterizes the “effective” number of parameters relating the complexity of the models. \(p_{\text{D}}\) is the difference between the posterior mean deviance, \(\overline{D(\varvec{\theta })}\) and deviance calculated at the posterior mean of the parameters, \(\hat{D}(\varvec{\theta })\). Smaller values of DIC justify a better fit of the model. In line with this derivation, the DIC measure for the Bayesian Bernoulli-Exponential model is specified as;

$$\begin{aligned} \text {DIC}&=-2\Bigg \{\sum _{i=1}^{s} y_{i 1} \left( \varvec{\hat{\alpha }}^{\prime }\varvec{X}_{i}\right) +\ln \left( \frac{1}{1+\exp (\hat{\varvec{\alpha }}^{\prime }\varvec{X}_{i})}\right) \\&\quad +\sum _{i=1}^{s}\sum _{k=2}^{n_{i}} \left( y_{ik} \left( \varvec{\hat{\alpha }}^{\prime }\varvec{X}_{i} + \hat{\vartheta } t_{ik}+\hat{\psi } y_{i k-1}\right) +\ln \left( \frac{1}{1+\left( \varvec{\hat{\alpha }}^{\prime }\varvec{X}_{i} + \hat{\vartheta } t_{ik}+\hat{\psi } y_{i k-1}\right) }\right) \right) \\&\quad +\sum _{i=1}^{s}\sum _{k=2}^{n_{i}} \bigg ((\hat{\xi }+\hat{\gamma } y_{ik-1})-\exp \left( \hat{\xi }+\hat{\gamma } y_{ik-1}\right) t_{ik}\bigg )\Bigg \} +2 p_{\text{D}}. \end{aligned}$$

Results

Simulation study

In order to assess the Bayesian Bernoulli-Exponential model in terms of how it can be influenced by controlled variations in sample size, visit schema and types of prior distributions on the parameter estimates we present in this subsection, a simulation study. More precisely, the simulation study helps establish the validity of the joint model in random scenarios via data generation and parameter estimation. It is important to clarify, however, that this present study is an extension of the studies of Bronsert [6], Lin [33], Seo [8] and Zaagan [9] and thus for computational convenience, an abundant level of consistency is maintained in terms of simulation conditions. All simulations are performed in R software via the R2OpenBugs package. This package provides a means to program Bayesian models in R via an OpenBugs software [34, 35]. To develop the Bayesian joint model, the structure of the data to be simulated is clearly defined. We simulate data involving two categorical variables, each having three levels, and two continuous variables. The longitudinal responses are simulated from a Bernoulli distribution. The first response is simulated from the distribution, and then the subsequent response is computed based on the relationship between the prior outcome and the prior time for predicting the average response based on starting parameter values in Table 1. It is important to clarify, however that during the simulation exercise, only “plausible” starting values from the range of starting values in Table 1 are utilized. It is not the intent of this study to analyze the impact of all four range of starting values. The visit times for each of the corresponding responses are simulated from an exponential distribution.

Table 1 Parameter initial value scheme for simulations

Furthermore, we simulate design structures that consider varying visit schemes and sample sizes to effectively study trends or patterns associated with the model. In this study, three varying sample sizes with four sub design visit structures entailing both balanced and unbalanced visit structures are considered and shown in Table 2. Also, three prior schemes are considered, that is Gaussian informative, Gaussian non-informative and Jeffreys non-informative priors.

Table 2 Simulation design scheme

Thus, the simulation matrix involves three varying sample size designs, three varying prior schemes and three visit design structures. To further clarify the visit structure, as an example to signal an unbalanced visit pattern, when the sample size is 180 and the number of observations is 20 & 6 , this exemplifies 90 participants having 20 recorded observations and another 90 subjects have 6 measured outcomes each. This simulation design scheme results in 27 differing designs for the simulation analysis of the Bayesian Bernoulli-Exponential joint model.

After data generation, the simulation analysis involves estimating the joint model parameters via the package R2Openbugs in R software. It commences by first “sinking” in generated parameter values which that serve as initial values for the MCMC estimation process. Then, the likelihood of the Bayesian joint model is calculated based on the design structures and priors specified. Parameter estimation proceeds with the Gibbs Sampling approach, which has earlier been discussed. This generates dependent Markov chains for our model parameters by drawing samples from the posterior distribution using initial parameter values that were embedded in the simulation design. Markov chains are run iteratively 30,000 times, and the first 10,000 iterations are discarded to serve as burn-in, effectively mitigating the influence of the initial values. Thinning intervals of three iterations are considered to monitor autocorrelations of the generated values. Subsequently, to monitor convergence of Markov chains and their associated posterior parameters, the Heidelberger and Welch convergence tests are computed. Then, posterior summaries such as the mean, standard deviation, and credible interval limits are presented. It is instructive to note that the simulations were replicated a 1000 times and inferences were premised on the averaged estimates and associated credible intervals. Finally, inferences via comparisons for different specification of the prior distribution and their sample size and visit design schemes for the model are made along with Deviance Information Criterion measures.

Simulation results: model convergence assessment of the Bayesian Bernoulli-Exponential joint model

To evaluate convergence of the Markov chains of the model parameters, a formal diagnostic test, called the Heidelberger and Welch test [31] is used. It is expected that after the burn-in period, the Gibbs Sampling algorithm produces samples from the posterior distribution that attains a stationary distribution. The Heidelberger and Welch test constitutes a stationary and half-width test and calculates a test statistic to accept or reject the null hypothesis that the Markov chains are from a stationary distribution. The half-width test is based on a computed \(95\%\) confidence interval for the mean, using the chain that earlier passed the stationarity test. The resulting ratio of the interval half-width and the mean compared with a threshold ( \(\varepsilon =0.1\)) determines whether the half-width test is passed or not. More precisely, the test passes if the ratio between the half-width and the mean is lesser than \(\varepsilon\). Selected convergence results based on the Heidelberger and Welch test are presented for the Bayesian Bernoulli-Exponential joint model across select scenarios and shown in the Tables 3, 4 and 5. These results cut across all prior scenarios (informative, non-informative, Jeffreys non-informative Prior), sample sizes (18, 54, 180) and visit patterns (10, balanced), (5 &3, Unbalanced), (20 & 6 ,Unbalanced). Inferring from the Heidelberger and Welch tests conducted across the broad range of scenarios selected, no issues were observed with the convergence of the MCMC chains for the Bayesian Bernoulli-Exponential Joint model. More precisely, the p-values resulting from the stationarity test for all estimated model parameters, regardless of prior, sample size or visit schemes were statistically insignificant. This suggests that the sampled values for parameters are from a stationary process. A further indication is that our model parameter estimation can be implemented with precision because MCMC chains are in a stationary distribution.

Table 3 Heidelberger and welch test for the Bayesian Bernoulli-Exponential model and for the Gaussian informative prior
Table 4 Heidelberger and welch test for the bayesian bernoulli-exponential model and for the gaussian non-informative prior
Table 5 Heidelberger and welch test for the Bayesian Bernoulli-Exponential model and for the Jeffreys non-informative prior

Simulation results: parameter estimation and evaluation of the Bayesian Bernoulli-Exponential model

In this section, the influence of controlled variations in sample size, visit sequences and type of prior distributions on the estimated parameters of the Bayesian Bernoulli-Exponential model are examined. Consistency in the direction of these estimates and their associated credible intervals are checked. For ease of reporting, we present a select number of results from the various simulation scenarios. Posterior means, standard deviations and credible intervals of select scenarios are presented in Tables 6, 7, 8, 9 and 10.

Table 6 Table of parameter estimates for the Bayesian Bernoulli-Exponential joint model and for the Gaussian informative prior scheme
Table 7 Table of parameter estimates for the Bayesian Bernoulli-Exponential joint model and for the Gaussian non-informative prior scheme
Table 8 Table of parameter estimates for the Bayesian Bernoulli-Exponential joint model and for the Jeffreys non-informative prior scheme
Table 9 Credible interval widths for selected scenarios for the Bernoulli-Exponential model
Table 10 Table of parameter estimates for the Bayesian Bernoulli-Exponential joint model and for the Jeffreys non-informative prior scheme

Fixing sample sizes and priors across scenarios and examining the effect of varying sequences on parameter estimates, a consistent trend in magnitude and direction of the estimates and their log-transformation were observed across all scenarios. For example, the parameter estimates of results obtained from the model when sample size and time sequence 54(10) and \(54(20 \& 6)\), 18(10) and \(18(5 \& 3)\), 180(10) and \(180(5 \& 3)\) under informative prior scheme were not markedly different in terms of their magnitude and direction. As an example, the posterior means and standard deviations obtained for the model scenario, sample size and visit scheme 180(10) under informative prior scheme from were \(\alpha _{1}:0.100(0.183)\), \(\alpha _{2}:0.065(0.194)\), \(\alpha _{3}:0.106(0.069)\), \(\alpha _{4}:-0.036(0.137)\), \(\alpha _{5}:0.226(0.138)\), \(\alpha _{6}:0.348(0.0.058)\), \(\alpha _{7}:0.724(0.068)\), \(~\gamma :-0.023(0.0560)\), \(\psi :-0.916(0.120)\), \(\vartheta :-0.116(0.026)\), \(\xi :-0.972(0.0.048)\). These estimates are not markedly different in magnitude and direction from when the time sequence changed to \(20 \& 6\) under the same scenario where the resulting estimates obtained were \(\alpha _{1}:0.200(0.133),\alpha _{2}:0.122(0.124),\alpha _{3}:0.335(0.125),\alpha _{4}:0.156(0.128),\alpha _{5}:0.149 (0.128), \alpha _{6}:0.302 (0.055), \alpha _{7}:0.779 (0.063), \gamma :-0.058 (0.050), \psi :-1.052 (0.114), \vartheta :-0.105 (0.024),\xi :-0.957 (0.044)\). This pattern was similarly observed across the other scenarios, fixing sample sizes, priors and varying the time-sequences and broadly demonstrates a consistency in estimation performance. This further indicates that varying time sequences do not considerably affect the resulting estimates. Examining the credible interval(CI) widths under the different schemes reveal an interesting trend. As the sample sizes across all scenarios increased, albeit keeping priors and time sequences constant, the CI widths were increasingly narrow, implying that when our proposed model is applied to datasets of increasing sample sizes, the resulting estimates are obtained with higher precision. For instance, as an example, we compare parameter estimates and their CI widths under a select Gaussian non-informative prior scenario for these model scenarios 18(10), 54(10) and 180(10) (see Table 11). The trend observed from the presented estimates are quite obvious; increasing sample sizes applied to the proposed Bayesian Bernoulli-Exponential model increases precision of the model estimates. This broadly cuts across all scenarios.

Table 11 Credible interval widths for selected scenarios for the Bernoulli-Exponential model

Simulation results: evaluation of the Bayesian Bernoulli-Exponential model

Finally, model performance is evaluated under the various simulation scenarios via the Deviance Information Criterion (DIC). Since there are a lot of DIC values computed for varying scenarios, they are presented graphically for ease of evaluation and clarity. The DIC plots of the selected simulation scenarios applied to the model are presented in Figs. 1, 2, 3, 4, 5, 6, 7 and 8. First, we fix sample sizes and compare how the model performs across the type of prior and visit sequence. Regardless of the kind of prior chosen for the model parameters, it is observed in Fig. 1 that in the smallest sample considered, 18, the model performs better for the time sequence \(5 \& 3\), reflected by lower DIC values across all prior scenarios. This is followed by the balanced time sequence, 10. In fact, there’s no marked difference between the DIC value of the time sequence \(5 \& 3 (599.8)\) and 10(628.4) when considering the Jeffreys prior and fixing the sample size at 18. This trend is consistently observed, even when the sample sizes are fixed at 54 and 180 (see Figs. 2 and 3). The model still performs better for the time sequence \(5 \& 3\) followed by 10. The next step in the model evaluation process involved fixing priors and comparing the models across competing sample sizes and sequences. For both Gaussian informative and non-informative priors, the DIC’s are very large for the time sequence \(20 \& 6\) and sample size 180 signaling that the model may not be robust for scenarios where visit sequences of individuals vary significantly. When Jeffreys prior is considered, yet again DIC’s obtained for the model in small sample size 18 and sequence type \(5 \& 3\) are very low indicating better performance followed closely by sample size 54, time sequence \(5 \& 3\). This model scenario performs better across all samples and sequences than the considered Informative and Non-Informative Prior Scenario. The DIC values were at par in samples 54 and 180 for time sequence 10 and \(5 \& 3\) when the Jeffreys prior was considered. Finally, an observation of model performance across sample size and prior schemes while keeping the time sequence fixed is made. Across time sequence \(5 \& 3\), the model performs better overall for sample size 18 and 54 regardless of prior chosen. No marked differences are observed however when the Jeffreys prior is used for sample size 18 and 54 as evidenced by Fig. 5. Furthermore, model performance does not broadly vary for the sample size 180, regardless of the prior chosen for sequence \(5 \& 3\) and \(20 \& 6\). The results for the visit sequence 10 are quite consistent with \(5 \& 3\) when compared. Models perform better in small sample size 18 scenarios as reflected by their lower DIC values, followed by 54.

The DIC values for sample size 54 and 180, however are close when the Jeffreys prior is considered for time sequence 10. Overall, model evaluation of the Bayesian Bernoulli-Exponential Model suggest a relatively better fit for small and medium sample size scenarios (18 and 54) with less varying time sequences (5 &3) and (10), regardless of prior choice. For larger samples (180), the models performs fairly well for less varying time sequences (5 &3) but not significantly so for time sequences (20& 6) regardless of the choice of prior.

Fig. 1
figure 1

Deviance information criterion plot for keeping sample sizes fixed at 18 and examining influence across priors and design schemes

Fig. 2
figure 2

Deviance information criterion plot for keeping sample sizes fixed at 54 and examining influence across priors and design schemes

Fig. 3
figure 3

Deviance information criterion plot for keeping sample sizes fixed at 180 and examining influence across priors and design schemes

Fig. 4
figure 4

DIC plot for keeping prior fixed at gaussian non-informative and examining influence across sample size and design schemes

Fig. 5
figure 5

Deviance information criterion plot for keeping visit sequence fixed at \(5 \& 3\) and examining influence across sample size and prior schemes

Fig. 6
figure 6

DIC plot for keeping visit sequence fixed at 10 and examining influence across sample size and prior schemes

Fig. 7
figure 7

Deviance information criterion plot for keeping prior fixed at gaussian informative and examining influence across sample size and design schemes

Fig. 8
figure 8

DIC plot for keeping prior fixed at jeffreys non-informative and examining influence across sample size and design schemes

A model application to bladder cancer recurrence data

In this section, the proposed Bayesian Joint Bernoulli-Exponential model is applied to a real-world dataset, called the Bladder Cancer Data. This data is openly available in \(\text{R}\) software, specifically in the “Survival” package [36] and results from a clinical trial on patients with bladder cancer conducted by the Veterans Administration Co-operative Urological Research Group (VACURG) [10, 11]. The bladder cancer dataset in \(\text{R}\) software comprises information on 85 subjects, measured four times, with randomly assigned treatments of only thiotepa or a placebo. 38 patients are assigned to the placebo group and 47 to the treatment(thiotepa) group. Data on patient experienced number of recurrences are collected including the number of initial tumours present pre-trial randomization. Other variables include “stop”, which measures the time interval in months since the last visit. The next scheduled visit is dependent on bladder tumor recurrence at the time of measurement, indicating that time can be considered informative, and that subsequent visits are likely be influenced by previous visits. Also, the intensity of visits depend on tumor recurrences. Furthermore, there is an “event” variable, which is a binary variable representing the recurrence of tumor(1) or (0) for non-recurrence attributable to reasons like death. The variables along with their description are given in Table 12 below.

Table 12 The bladder cancer data (called bladder) in R software

This data is analyzed with the following objectives in mind. Is there an effect of treatment type, size in centimeters(cm) of the largest initial tumor, initial number of tumors on the likelihood of tumor recurrence? Furthermore, is there an effect of prior recurrences(outcomes) on the likelihood of current recurrence? To answer these research questions, our proposed Bayesian Bernoulli-Exponential Joint model is fitted to the data. The binary “event” variable is used as the response and the predictors included in the model are treatment type, size in cm of the largest initial tumor, initial number of tumors and other time variables. Just as previously discussed in the Data and methods section, the Bayesian model involves the specification of a joint likelihood, priors and then the posterior distribution.

Here, three types of priors are considered and compared across the models. In this regard, the non-informative Gaussian priors considered for this model is,

$$\begin{aligned} p(\varvec{\alpha })&\sim N\left( \varvec{\text{0}}_{s}, 10^{8}\varvec{\text{I}}_{s}\right) \\ p({\gamma })&\sim N\left( 0, 10^{8}\right) \\ p(\vartheta )&\sim N\left( 0, 10^{8}\right) \\ p(\psi )&\sim N\left( 0, 10^{8}\right) \\ p(\xi )&\sim N\left( 0, 10^{8}\right) \end{aligned}$$

The Gaussian Informative priors considered for this model is,

$$\begin{aligned} p(\varvec{\alpha })&\sim N\left( \varvec{0.4}_{s}, 4.0\varvec{\text{I}}_{s}\right) \\ p({\gamma })&\sim N\left( 0.2, 0.1\right) \\ p(\vartheta )&\sim N\left( 0.5, 0.5\right) \\ p(\psi )&\sim N\left( 0.2, 0.2\right) \\ p(\xi )&\sim N\left( 2.0, 0.2\right) \end{aligned}$$

Furthermore, we consider Jeffreys non-informative priors for the \(\alpha\) parameters and Gaussian non-informative priors for the visit parameters. The resulting posterior distribution of the Bayesian Bernoulli-Exponential Joint model for the bladder cancer data, for the instance where the Jeffreys prior considered for the parameters of the Bernoulli response process and Gaussian priors for the visit parameters in non informative settings is considered is;

$$\begin{aligned} p(\varvec{\alpha }, \vartheta ,\psi ,\xi ,\phi |\varvec{Y}_{i},\varvec{t}_{i},\varvec{X} )=L\left( \Theta , y_{1},y_{2},y_{3} \cdots , y_{s}\right) \times p(\varvec{\alpha }|\phi )\times p(\vartheta )\times p(\psi )\times p(\xi ) \end{aligned}$$
(19)
$$\begin{aligned}{}&=\prod _{i=1}^{s}\Bigg \{\exp \bigg \{y_{i1} \ln \left( \frac{\mu _{i1}}{1-\mu _{i1}}\right) +\ln (1-\mu _{i1})\bigg \}\times \prod _{k=2}^{n_{i}} \exp \Big \{y_{ik} \ln \left( \frac{\mu _{ik}}{1-\mu _{ik}}\right) \\ {} &\quad+\ln (1-\mu _{ik})\times \exp \left( \xi +\gamma y_{ik-1}\right) \times \exp \left( -\exp \left( \xi +\gamma y_{ik-1}\right) t_{ik}\right) \Big \}\Bigg \}\\&\quad\times \left| \varvec{X}^{\prime } \varvec{P} \varvec{V}(\varvec{\alpha }) \Delta ^{2}(\varvec{\alpha }) \varvec{X}\right| ^\frac{1}{2}\times \frac{1}{\sqrt{2 \pi \nu _{\vartheta }^{2}}} \exp \left( -\frac{1}{2}\left( \vartheta -\mu _{\vartheta }\right) ^{2}\right) \times \frac{1}{\sqrt{2 \pi \nu _{\psi }^{2}}} \exp \left( -\frac{1}{2}\left( \psi -\mu _{\psi }\right) ^{2}\right) \\&\quad\times \frac{1}{\sqrt{2 \pi \nu _{\xi }^{2}}} \exp \left( -\frac{1}{2}\left( \xi -\mu _{\xi }\right) ^{2}\right) \end{aligned}$$

Here, \(\varvec{V}(\varvec{\alpha })=\text {diag}\left( v_{1},v_{2} \ldots , v_{n}\right)\) and \(v_{i}=\mu _{ik}(1-\mu _{ik})\). and,

  • \(\alpha _{s}\) are regression parameters representing the effect of the predictors; treatment type(\(x_{2}\)), initial number of tumors, (\(x_{3}\)) and size in (cm)(\(x_{4}\)) of the largest initial tumor on the likelihood of tumor recurrence.

  • \(\psi\) represents the effect of the prior recurrence on the mean response of the current recurrence and \(\vartheta\) characterizes the effect of current recurrence time on the mean recurrence,

  • \(\xi\) is a constant parameter associated with time and \(\gamma\) is the effect of the previous recurrence on the mean time.

Other components are already explained thoroughly in the Data and methods section. Note that the posterior distribution changes when the priors change in the Gaussian and non-Gaussian settings considered for all parameters. Then, after the posterior specification, we proceed with the joint parameter estimation with the Gibbs sampling approach in R software. For each of the three prior scenarios considered, the Markov chains are run iteratively 30,000 times, and the first 10,000 iterations are discarded to serve as burn-in. Convergence of the markov chains and associated posterior parameters are monitored via the Heidelberger and Welch tests. Then, posterior summaries are computed. Parameter significance is inferred via credible intervals and the models are compared with the Deviance Information Criteria Measure. Results of the Heidelberg and Welch convergence tests from the application to the bladder cancer data with the different prior scenarios are presented in Table 13. Inferring from the tests conducted, no issues were observed with the convergence of the MCMC chains. Overall, we can proceed with posterior summary inference with precision since the MCMC chains are in a stationary distribution.

Table 13 Heidelberger and welch test for the Bayesian Bernoulli-Exponential model for the bladder cancer data including three prior scenarios

After convergence assessment of the model, inference based on the posterior summary measures is the next step. Posterior means, standard deviations and associated credible intervals of the prior scenarios are presented in Table 14 along with their corresponding DIC’s. The best model is chosen based on the least DIC value. Observing the results, the model under the Jeffreys non-informative prior, yielded the least DIC (1108) value. Ergo, parameter inference is based on the Bayesian Bernoulli-Exponential model with Jeffreys prior specified. The results demonstrate that the effect of treatment type is statistically significant on the likelihood of cancer recurrence inferring from its credible interval \(\alpha _{2}=0.216\) (0.232, 0.411). The initial number of tumors have a significant effect \(\alpha _{3}=0.036\) (0.001, 0.108) on the likelihood of cancer recurrence and hence a significant prognostic factor. Furthermore, the size in cm of the largest tumor has a significant marker on the likelihood of cancer recurrence. Afterwards, the time parameters are observed. The effect of prior tumor recurrence on the mean response of current tumor recurrence, represented by \(\psi\) is statistically significant \(-0.408(-1.009,-0.135)\), indicating that previous tumor recurrences influence the probability of subsequent recurrences. Additionally, the effect of current recurrence time(\(\vartheta\)) is significant on average recurrence, reflected by the estimated probability (0.157)(0.018, 0.337).

Table 14 Results of the Bayesian Bernoulli-Exponential model applied to the bladder cancer data with different prior scenarios considered

Discussions and conclusions

Broad assumptions underlie the usage of longitudinal analysis approaches, ranging from univariate designs to the even the most complex conditional and marginal modeling approaches. One of the common assumptions, albeit implausible in certain scenarios, is the supposition that time is always fixed and predetermined by statistical design. Phenomenons may alter the time trajectory of study subjects, like sickness or adverse events in clinical trials, which may result in not only irregular time points for subjects, but also imbalanced data and differing visit intensities. This implies current visit outcomes being informative to subsequent ones. It is also important to emphasize that the issue of informative censoring may be less problematic in the context of an informative time/schedule designs, given the assumed observation schedule protocols. In simpler terms, individuals with more severe conditions requiring early interventions or treatments, which could lead to informative censoring, would also have shorter observation schedules and, consequently, more “frequent” measurements. This assumption underlies the simulation design for this study. In this article, we have developed a Bayesian joint model for longitudinal outcomes from the exponential family of distributions with particular emphasis on Bernoulli distributed longitudinal outcomes and exponentially distributed informative time points. An assessment of the influence of controlled sample size scenarios, visit and prior specification schemes on the estimated parameters of the proposed Bayesian Bernoulli-Exponential joint model was performed via simulations and was evaluated based on Deviance Information Criteria.

The methods commenced with specifying likelihoods for the joint outcome and time distributions, specification of priors, and then a discussion on the Markov Chain Monte Carlo Approach for estimating posterior parameters. The priors considered were Gaussian informative priors, Gaussian non-informative priors and Jeffreys non-informative priors. Convergence analysis was performed with the Heilderberg and Welch Test. Once the models converged, posterior inference followed and models were evaluated based on Deviance Information Criteria. Inference from the Heidelberger and Welch Tests conducted across selected simulation scenarios for the Bayesian Bernoulli-Exponential broadly suggested no pertinent issues with the convergence or stationarity of MCMC chains for estimated parameters irrespective of prior specified, sample size or visit schemes. Fixing sample sizes and priors across selected scenarios of the model and examining effect of varying sequences on parameter estimates, a consistent trend in magnitude and direction of the estimates and their transformations were observed.

As sample sizes increased, albeit keeping priors and time sequences constant, credible interval widths were increasingly narrow, indicating that when the proposed model is applied to datasets of increasing sample sizes, resulting estimates are obtained with higher precision. Overall, evaluation made for the Bayesian Bernoulli-Exponential model indicated better performance for the less intense visit sequence \(5 \& 3\) scenario, reflected by lower DIC values, followed by the balanced visit sequence 10 regardless of sample size or prior type. Sample sizes across various simulation scenarios performed similarly well, only that the difference in performance was largely attributable to the sequence of individual visits. Finally, the proposed model has been applied to a bladder cancer recurrence data to serve as an application example.

Availability of data and materials

Beyond the simulation analysis, the data that support the findings of this study and for the model application is openly available in R software, called ‘bladder’ specifically in the “Survival” package [36] and results from a clinical trial on patients with bladder cancer conducted by the Veterans Administration Co-operative Urological Research Group (VACURG) [10, 11].

References

  1. Fitzmaurice GM, Laird NM, Ware JH. Applied longitudinal analysis. Wiley; 2012.

  2. Molenberghs G, Verbeke G. Models for discrete longitudinal data. 2005.

  3. Rizopoulos D. Joint models for longitudinal and time-to-event data: With applications in R. CRC Press; 2012.

  4. Twisk JW. Applied longitudinal data analysis for epidemiology: a practical guide. Cambridge University Press; 2013.

  5. Verbeke G. Linear mixed models for longitudinal data. In: Linear mixed models in practice. Springer; 1997. p. 63–153.

  6. Bronsert MR. A joint model of a longitudinal process and informative time schedule data. University of Northern Colorado; 2009.

  7. Alomair MA. A Joint Model of Longitudinal Data and Informative Time with Time-Dependent Covariate. University of Northern Colorado; 2017.

  8. Seo J. Joint Models of Longitudinal Outcomes and Informative Time. 2015.

  9. Zaagan AAA. Bayesian Approach of Joint Models of Longitudinal Outcomes and Informative Time. University of Northern Colorado; 2020.

  10. Byar D. The veterans administration study of chemoprophylaxis for recurrent stage i bladder tumours: comparisons of placebo, pyridoxine and topical thiotepa. In: Bladder tumors and other topics in urological oncology. Springer; 1980. p. 363–370.

  11. Byar D, Blackard C, Veterans Administration Cooperative Urological Research Group, et al. Comparisons of placebo, pyridoxine, and topical thiotepa in preventing recurrence of stage I bladder cancer. Urology. 1977;10(6):556–561.

  12. Nelder JA, Wedderburn RW. Generalized linear models. J R Stat Soc: Ser A (General). 1972;135(3):370–84.

    Google Scholar 

  13. McCullagh P, Nelder J. Generalized linear models. 2nd ed. London: Chapman & Hall/CRC; 1989.

    Book  Google Scholar 

  14. Lin, Ying Z. Semiparametric and nonparametric regression analysis of longitudinal data. J Am Stat Assoc. 2001;96(453):103–126.

  15. Lin, Scharfstein DO, Rosenheck RA. Analysis of longitudinal data with irregular, outcome-dependent follow-up. J R Stat Soc Ser B (Stat Methodol). 2004;66(3):791–813.

  16. Liang, Lu W, Ying Z. Joint modeling and analysis of longitudinal data with informative observation times. Biometrics. 2009;65(2):377–384.

  17. Sun J, Park DH, Sun L, Zhao X. Semiparametric regression analysis of longitudinal data with informative observation times. J Am Stat Assoc. 2005;100(471):882–9.

    Article  MathSciNet  CAS  Google Scholar 

  18. Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis Chapman & Hall. CRC Texts in Statistical Science; 2004.

  19. Lesaffre E, Lawson AB. Bayesian biostatistics. Wiley; 2012.

  20. Murray JV, Goldizen AW, O’Leary RA, McAlpine CA, Possingham HP, Choy SL. How useful is expert opinion for predicting the distribution of a species within and beyond the region of expertise? A case study using brush-tailed rock-wallabies Petrogale penicillata. J Appl Ecol. 2009;46(4):842–51.

    Article  Google Scholar 

  21. Martin TG, Burgman MA, Fidler F, Kuhnert PM, Low-Choy S, McBride M, et al. Eliciting expert knowledge in conservation science. Conserv Biol. 2012;26(1):29–38.

    Article  PubMed  Google Scholar 

  22. Congdon P. Bayesian models for categorical data. Wiley; 2005.

  23. Wu L. Mixed effects models for complex data. CRC Press; 2009.

  24. Ntzoufras I. Bayesian modeling using WinBUGS. vol. 698. Wiley; 2011.

  25. Jeffreys H. An invariant form for the prior probability in estimation problems. Proc R Soc Lond Ser A Math Phys Sci. 1946;186(1007):453–61.

    ADS  MathSciNet  CAS  Google Scholar 

  26. Kass RE. The geometry of asymptotic inference. Stat Sci. 1989;188–219.

  27. Gelfand AE. Gibbs sampling. J Am Stat Assoc. 2000;95(452):1300–4.

    Article  MathSciNet  Google Scholar 

  28. Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell. 1984;6:721–41.

    Article  CAS  PubMed  Google Scholar 

  29. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equation of state calculations by fast computing machines. J Chem Phys. 1953;21(6):1087–92.

    Article  ADS  CAS  Google Scholar 

  30. Gilks WR, Best NG, Tan KK. Adaptive rejection Metropolis sampling within Gibbs sampling. J R Stat Soc: Ser C (Appl Stat). 1995;44(4):455–72.

    Google Scholar 

  31. Heidelberger P, Welch PD. Simulation run length control in the presence of an initial transient. Oper Res. 1983;31(6):1109–44.

    Article  Google Scholar 

  32. Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. J R Stat Soc: Ser B (Stat Methodol). 2002;64(4):583–639.

    Article  MathSciNet  Google Scholar 

  33. Lin YK. Hypothesis testing for the Gaussian-Exponential longitudinal model. 2011.

  34. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria; 2017. https://www.R-project.org/. Accessed Feb 2021.

  35. Sturtz S, Ligges U, Gelman A. R2OpenBUGS: a package for running OpenBUGS from R. J Stat Softw. 2019;31:1109–44.

  36. Therneau T, Lumley T. R survival package. 2013.

Download references

Acknowledgements

The author acknowledges the support of Dr. Khalil Shafie, Dr. Han Yu and Dr. Khaledi Bahaedin at the University of Northern Colorado for their insightful comments on this project during the author’s Ph.D.’s dissertation.

Funding

This project was self-funded.

Author information

Authors and Affiliations

Authors

Contributions

This manuscript was sole-authored by Michael Safo Oduro.

Corresponding author

Correspondence to Michael Safo Oduro.

Ethics declarations

Ethics approval and consent to participate

Ethical approval was not required for this study since the data used for this study are secondary data and openly/publicly available in R software.

Consent for publication

Not applicable.

Competing interests

The author declares no competing interests. Also, the author currently works for Pfizer Inc. Results, views or research findings expressed in this manuscript do not necessarily reflect those of Pfizer, Inc.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Oduro, M. A Bayesian Bernoulli-Exponential joint model for binary longitudinal outcomes and informative time with applications to bladder cancer recurrence data. BMC Med Res Methodol 24, 54 (2024). https://doi.org/10.1186/s12874-024-02160-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12874-024-02160-2

Keywords