 Research
 Open access
 Published:
Spatialtemporal Bayesian accelerated failure time models for survival endpoints with applications to prostate cancer registry data
BMC Medical Research Methodology volume 24, Article number: 86 (2024)
Abstract
Prostate cancer is the most common cancer after nonmelanoma skin cancer and the second leading cause of cancer deaths in US men. Its incidence and mortality rates vary substantially across geographical regions and over time, with large disparities by race, geographic regions (i.e., Appalachia), among others. The widely used Cox proportional hazards model is usually not applicable in such scenarios owing to the violation of the proportional hazards assumption. In this paper, we fit Bayesian accelerated failure time models for the analysis of prostate cancer survival and take dependent spatial structures and temporal information into account by incorporating random effects with multivariate conditional autoregressive priors. In particular, we relax the proportional hazards assumption, consider flexible frailty structures in space and time, and also explore strategies for handling the temporal variable. The parameter estimation and inference are based on a Monte Carlo Markov chain technique under a Bayesian framework. The deviance information criterion is used to check goodness of fit and to select the best candidate model. Extensive simulations are performed to examine and compare the performances of models in different contexts. Finally, we illustrate our approach by using the 20042014 Pennsylvania Prostate Cancer Registry data to explore spatialtemporal heterogeneity in overall survival and identify significant risk factors.
Introduction
Prostate cancer (PC) is the most common cancer after nonmelanoma skin cancer and the second leading cause of cancer deaths in US men, with 31,620 deaths estimated in 2019, a 7% increase compared with 2018 [1]. In recent years, PC care and outcomes have substantially improved, with a 5year survival rate of up to 100% if the cancer is diagnosed at an early stage; however, these improvements are not equally shared across geographic regions, and elevated mortality has been observed among patients in some specific areas (i.e., rural or Appalachian regions) [2]. One important potential factor driving this geographic disparity is access to highquality cancer care. Other disparities have also been explored, including black race, older age, and family history of PC [2]. Despite the high 5year survival rate with early diagnosis, the diagnosis of PC is likely to be delayed owing to the aforementioned factors. Thus, improving PC survival outcomes is still crucial and challenging, and there is a need to better understand the spatialtemporal heterogeneity of PC and identify highrisk populations to enable more effective implementation of screening policies and intervention strategies.
To achieve these goals, existing populationbased cancer registry data provide fruitful resources and platforms for analysis; however, there are several specific data issues: 1) substantially multimodal risk factors with various data types; 2) spatialtemporal variation in cancer mortality, with adjacent neighborhoods or temporal cohorts more alike than those from distant regions or years owing to similar environmental and social factors [3,4,5,6]; and 3) the availability of individuallevel data for analysis. In this article, we used populationbased Pennsylvania (PA) cancer registry data (PCR) from the PA Department of Health to examine the spatialtemporal pattern of survival in patients with a primary clinical diagnosis of PC in PA between 2004 and 2014 [5,6,7,8]. The PCR is annually collected, including demographic (e.g., age at diagnosis, race, insurance) and clinical information (e.g., serum prostatespecific antigen [PSA], Gleason score, tumor stage, firstcourse treatment) from hospitals, clinics, and other medical facilities, as well as geospatial information [4,5,6,7,8]. Note that in PA, there are around 78,000 newly diagnosed cancer cases each year, with a mortality rate of 169.1 per 100,000 men in 2004–2014 (ageadjusted to the 2000 US standard population) according to a PA Department of Health report on the burden of cancer in PA in 2019; however, there have been limited studies on PA survival on prostate cancer taking spatial heterogeneity and temporal trend into account.
To analyze such registry data more efficiently for valid inference, advanced statistical methods for cancer survival analysis are needed. There are two widely used methods for timetoevent analysis: the Cox proportional hazards (PH) model and the accelerated failure time (AFT) model, both with extensive extensions [9,10,11,12]. The PH assumption is highly likely to be violated in cancer registry data owing to the multimodality (i.e., demographic and clinical information) and hierarchical structure (i.e., individuallevel, countylevel) of risk factors [3, 8]. We performed a preliminary check of PCR data and realized that the plots of the Schoenfeld residuals indicated that the PH assumption was violated for race in several counties, as shown in Fig. 1. Owing to these specific data features and issues, Cox PH regression may lead to biased estimates and invalid inference. Additionally, failing to account for spatialtemporal heterogeneity could lead to biased inference; although substantial work has been done to address this issue, it has mostly focused on aggregate data analysis [13, 14]; individuallevel spatialtemporal analysis is yet to be explored. Therefore, to overcome these barriers, advanced statistical models are needed for rigorous exploration. Here, we propose advanced spatialtemporal models under the AFT framework to fill this gap. Among the existing literature on cancer registry survival analysis, there has been some work performed in the realm of spatial or spatiotemporal survival analysis. Carlin and Banerjee (2003) [15] considered hierarchical spatial process models for multivariate survival datasets that are spatiotemporally arranged and used Cox PH modeling approaches with spatial and temporal effects; Banerjee and Carlin (2003) [16] later proposed a semiparametric (i.e., Cox PH) hierarchical Bayesian frailty model to capture spatiotemporal heterogeneity for discrete survival times; Zhang and Lawson (2011) [3] proposed a spatial AFT model (with only spatial random effects considered); Zhou et al. (2017) [17] proposed a generalized AFT model with spatial frailty and considered informative censoring data; Onicescu et al. (2017) [18] developed a geographically augmented survival model with a complex spatiotemporal structure; however, their spatial and temporal components were not easily separated for interpretation; and Carroll et al. (2017) [19] proposed Bayesian AFT models with only spatial frailty terms to investigate spatial differences in breast cancer mortality following cancer diagnosis using the 2000–2013 Louisiana SEER data. Later, Wang et al. (2020) [8] used this approach to investigate the effects of risk factors on overall survival in newly diagnosed PC patients, and Carroll et al. (2019) [20] extended Bayesian AFT models to explore spatial and temporal options for structuring frailties that follow a random walk process, but they considered the AFT models with a standard logistic distribution for the error term and did not clearly mention how to better handle the variable of diagnosis year in regression analysis. Some other work includes Wang et al. (2012) [21], Hurtado et al. (2016) [22], Sharmin and Khan (2017) [23], Carroll and Zhao (2019) [24], among others. Comparing to the previous work, our proposed model has the following advantages. First, Bayesian AFT models are employed instead of Cox PH models and relax the PH assumption, and also our program possess different choices of distributions (e.g., Weibull, loglogistic); Second, the fixed effects including linear predictors of interest and the random effects with flexible frailty structures in space and time are incorporated. In particular, we explore different ways to handle the time variable (e.g., the year of diagnosis) which are understudied in the literature, and evaluate their empirical performance of effectiveness on survival inference under a variety of settings; Third, we implement our estimation procedures and model diagnosis in R with callable C functions for computing efficiency, which are available in the GitHub for public use by cancer researchers and other research purposes (refer to https://github.com/zli141/sptime).
The rest of the paper is organized as follows. In the “Methods” section, we provide notation and describe our proposed spatialtemporal AFT models, as well as the Bayesian algorithm for parameter estimation and inference. The “Simulation study” section details the extensive simulations conducted to evaluate our proposal. The results from the motivating example of the PCR of PC are presented in the “Data application” section. Finally, the “Discussion” section offers discussion with concluding remarks and potential future work.
Methods
Accelerated failure time model
For the \(k^{th}\) (\(k=1,\ldots , K_{ij}\)) subject from the \(i^{th}\) (\(i=1,\ldots ,I\)) county in the \(j^{th}\) (\(j=1,\ldots ,J\)) temporal cohort, let \(T_{ijk}\) denote the time to death after diagnosis of PC, and let \(C_{ijk}\) represent the corresponding censoring time. Thus, \(Y_{ijk}=\min (T_{ijk},C_{ijk})\) is the observed followup time, with \(\delta _{ijk}=I(T_{ijk} \le C_{ijk})\) as the death indicator. In addition, \(\varvec{x}_{ijk}\) is a \(p \times 1\) vector of covariates for survival regression, which could include timedependent or timevarying factors of interest.
The AFT model can be expressed in a linear form with the log link function of \(T_{ijk}\),
where \(\mu\) is the populationlevel mean and \(\sigma\) is a shape parameter that controls the shape of the survival curve; \(\varvec{\beta }\) is a \(p \times 1\) vector of regression coefficients associated with the covariates \(\varvec{x}_{ijk}\); and \(\epsilon _{ijk}\) is the residual, which follows a distribution function \(F_\epsilon (\cdot )\). For instance, we can consider an \(\epsilon _{ijk}\) that follows a standard extreme value distribution; thus, \(f_{\epsilon }(\varepsilon )=\exp (\varepsilon e^\varepsilon )\) and \(F_\epsilon (\varepsilon )=1\exp (e^{\varepsilon })\), where \(\epsilon\) follows a standard extreme value distribution and so \(T_{ijk}\) follows a Weibull distribution. Other choices for \(\epsilon _{ijk}\) include a standard normal distribution and a standard logistic distribution.
The conditional autoregressive (CAR) prior
In order to account for the county level spatial heterogeneity, we include a spatial random effect in the AFT model, which is given by
Borrowing an idea from linear mixedeffect models, we could assume that the random effect \(\omega _i\) follows a normal distribution \(N(0,\tau ^{2}).\) However, unlike traditional mixedeffect models, the \(\omega _i\) \((i=1,\ldots , I)\) are not independently distributed; this is because the adjacent counties tend to be correlated owing to the potential of sharing similar environmental or social factors [25]. Besag (1974) [26] proposed a conditional autoregressive (CAR) distribution for \(\omega _i\) and assumed
where \(\varvec{\omega }_{i}\)=\(\{\omega _1,\ldots ,\omega _{i1},\omega _{i+1},\ldots ,\omega _{I}\}\). Note that \(m_{ii'}\) is defined as 1 if county i and county \(i'\) are adjacent; otherwise \(m_{ii'}=0.\) This can be interpreted as the arithmetic mean of \(\omega _i\) being the arithmetic mean of the \(\omega _{i}\) values of those counties adjacent to it. The joint density function of \(\varvec{\omega }=(\omega _1,\dots ,\omega _I)^{T}\) can be expressed by
where \(\varvec{D}_\omega =\text {diag}\left\{ \sum _{i' \ne 1}m_{1i'},\dots , \sum _{i'\ne I}m_{Ii'}\right\}\) is a diagonal matrix, with the \(i^{th}\) diagonal element as the total number of the counties adjacent to county i. \(\varvec{C}=(m_{ii'})_{I\times I}\) is the adjacent matrix of all the counties in the study.
Given the observed data \(D =\{(Y_{ijk},\delta _{ijk}), i=1,\ldots ,I, j=1,\ldots ,J, k=1,\ldots , K_{ij}\}\) and spatial random effects \(\varvec{\omega }\), the conditional likelihood function can be derived as
where \(f(t_{ijk}\omega _i)\) and \(S(t_{ijk}\omega _i)\) are the conditional density function and survival function, respectively, for the \(k^{th}\) subject from the \(i^{th}\) county in the \(j^{th}\) temporal cohort. Denoting \(\mu (\varvec{x}_{ijk})=\mu +\varvec{x}_{ijk}^T \varvec{\beta }+\omega _i\), we have
and
The multivariate conditional autoregressive (MCAR) prior
The CAR prior only accounts for the correlations between different counties; however, the PCR for PC also includes patients enrolled in different years (i.e., temporal cohorts). It is natural to anticipate that there might be some degree of correlation between either temporal cohorts or counties. Banerjee and Carlin (2003) [16] proposed a Cox PH model with the MCAR prior to address the spatialtemporal dependency. Owing to the violation of the PH assumption (Fig. 1), we extended the application of the MCAR prior to the AFT model for further survival analysis [27, 28].
Here, we induce another random effect vector regarding temporal cohorts of the \(i^{th}\) county, \(\varvec{\gamma }_i=(\gamma _{i1},\dots , \gamma _{iJ})^{T}\), into the AFT model. The spatialtemporal model can be expressed as
where \(\varvec{z_{ik}}=(z_{i1k},\dots ,z_{iJk})^{T}\) is a \(J \times 1\) with \(z_{ijk}\) as a dichotomous temporal cohort indicator (1=yes, 0=no) for the \(k^{th}\) patient from the \(i^{th}\) county; \(\varvec{\xi }= (\xi _1,...,\xi _J)^T\) is the fixed temporal effect; \(\varvec{\eta _{i}}=(\eta _{i1},\dots ,\eta _{iJ})^{T}\) with \(\eta _{ij}\) as a binary indicator covariate associated with the \(j^{th}\) yearspecific random effect \(\gamma _{ij}\).
Let \(\varvec{\Delta }_i=(\omega _i, \varvec{\gamma }_i^{T})^{T}\), and we assume
where \(\varvec{\Lambda }\) represents the hyperparameters in the MCAR prior. Thus, the joint density function of \(\varvec{\Delta }=\text {vec}\left\{ \varvec{\Delta }_{1},\dots , \varvec{\Delta }_I\right\}\) is
Note that the MCAR prior may be improper, because the variancecovariance matrix of the normal distribution could be singular. Similarly, the conditional likelihood of the observed data \(D=\{(Y_{ijk},\delta _{ijk}), i=1,\ldots ,I, j=1,\ldots ,J, k=1,\ldots ,K_{ij}\}\) is
where \(f(t_{ijk}\varvec{\Delta }_i)\) and \(S(t_{ijk}\varvec{\Delta }_i)\) are the conditional density function and survival function, respectively, for the \(k^{th}\) subject from the \(i^{th}\) county in the \(j^{th}\) temporal cohort. Denote \(\mu '(\varvec{x}_{ijk})=\mu +\varvec{x}_{ijk}^{T}\varvec{\beta }+\varvec{z}_{ik}^{T}\varvec{\xi }+\varvec{\eta }_{i}^{T}\varvec{\gamma }_i+\omega _i\), we have
and
Bayesian inference with the MCAR prior
When \(\varvec{D}_\omega \varvec{C}\) is a nonsingular matrix, the density function of \(\varvec{\Delta }\) is proper, and we can find a unique solution based on the likelihood approach. However, when \(\varvec{D}_\omega \varvec{C}\) is a singular matrix, the density function is not proper; thus, Bayesian methods are preferred for parameter estimation and inference. Here, we consider the noninformative priors for \(\varvec{\beta }\), \(\varvec{\xi }\), \(\mu\), and \(\sigma ^2\), where
Also, with regard to random effects \(\varvec{\Delta }_i\), we select a conjugated prior for \(\varvec{\Lambda }\) with \(\varvec{\Lambda }\sim \text {Wishart} \left( p,\varvec{R}\right) .\) In order to let the prior be vague, p could be the dimension of \(\varvec{\Lambda },\) and \(\varvec{R}\) could be arbitrarily set as a diagonal matrix \(Diag\{100,...,100\}_{I\times I}.\) Let \(\varvec{X}=\{\varvec{x}_{ijk}\}\), \(\varvec{Z}=\{z_{ijk}\}\); \(i=1,\ldots ,I\); \(j=1,\ldots ,J\); \(k=1,\ldots , K_{ij}\). Then, we can derive the posterior distribution of \(\varvec{\Lambda }\) as follows:
where the element in the \(i^{th}\) row and \(j^{th}\) column of \(\varvec{V}\) is \(V_{ij}=\varvec{\Delta }_i^{*T}(\varvec{D}_w\varvec{C})\varvec{\Delta }^{*}_j\), with \(\varvec{\Delta }_i^{*}=(\Delta _{1i},\dots ,\Delta _{Ii})^T\) and \(\varvec{\Delta }_j^{*}=(\Delta _{1j},\dots ,\Delta _{Ij})^T\). The Gibbs sampler algorithm is used to generate the posterior samples of \(\varvec{\beta }\), \(\varvec{\xi }\), \(\mu\), \(\sigma\), and \(\varvec{\Lambda }\) [29]. In particular, for the \(t^{th}\) iteration, \(t=1,\ldots , M\) (where M is the total number of samples we will draw from the posterior distribution), we have the following:

Step 1:
Sample \(\varvec{\beta }^{(t)}\) from \(\textrm{Pr}(\varvec{\beta }^{(t)}D,\varvec{X,Z},\varvec{\xi }^{(t1)},\varvec{\Delta }^{(t1)}, \sigma ^{(t1)},\varvec{\Lambda }^{(t1)})\);

Step 2:
Sample \(\varvec{\xi }^{(t)}\) from \(\textrm{Pr}(\varvec{\xi }^{(t)}D,\varvec{X,Z},\varvec{\beta }^{(t)},\varvec{\Delta }^{(t1)}, \sigma ^{(t1)},\varvec{\Lambda }^{(t1)})\);

Step 3:
Sample \(\varvec{\omega }_i^{(t)}\) from \(\textrm{Pr}(\varvec{\Delta }_i^{(t)}D,\varvec{X,Z},\varvec{\beta }^{(t)},\varvec{\omega }_{(i)}^{(t1)}, \sigma ^{(t1)},\varvec{\Lambda }^{t1})\), for \(i=1,\dots ,I\);

Step 4:
Sample \(\sigma ^{(t)}\) from \(\textrm{Pr}(\sigma ^{(t)}D,\varvec{X,Z},\varvec{\beta }^{(t)},\varvec{\xi }^{(t)},\varvec{\Delta }^{(t)}, \varvec{\Lambda }^{(t1)})\);

Step 5:
Sample \(\varvec{\Lambda }^{(t)}\) from \(\textrm{Pr}(\varvec{\Lambda }^{(t)}\varvec{D,X,Z},\varvec{\beta }^{(t)},\varvec{\xi }^{(t)},\varvec{\Delta }^{(t)}, \sigma ^{(t)})\).
Of note, there is no closed form for the full conditional posterior distribution except in step 5, which is a Wishart distribution, \(\text {Wishart}(p+I, (\varvec{R}^{1}+\varvec{V}^{1})^{1})\). The Metropolis–Hastings algorithm was used to sample the parameters from their full conditional distribution [30]. Taking \(\textrm{Pr}(\varvec{\beta }^{(t)}D,\varvec{X,Z},\varvec{\xi }^{(t1)},\varvec{\Delta }^{(t1)}, \sigma ^{(t1)},\varvec{\Lambda }^{(t1)})\) as an example, we have the following procedures:

1
Generate \(U\sim Unif(0,1)\) and \(W\sim N(0,1)\);

2
Generate \(\varvec{\beta }^{New}\) from \(\varvec{\beta }^{New}=\varvec{\beta }^{(t1)}+sW\), where s is the step size of a random walk process;

3
Calculate
$$\begin{aligned} LR=\frac{L(D\varvec{\beta }^{New},\varvec{\Delta })\textrm{Pr}(\varvec{\beta }^{New})}{L(D\varvec{\beta }^{(t1)},\varvec{\Delta })\textrm{Pr}(\varvec{\beta }^{(t1)})}; \end{aligned}$$ 
4
If \(LR>U,\) \(\beta ^{(t)}=\beta ^{New}\), otherwise \(\beta ^{(t)}=\beta ^{(t1)}\).
Here, s is chosen to ensure the rejection rate is not extremely high, given that an optimal acceptance rate would be between 10% and 60%. In our simulation, our rejection rate is around 44%; we repeatedly sample \(\varvec{\beta },\varvec{\xi }, \mu\), \(\sigma\) and \(\varvec{\Lambda }\) from step 1 to step 5 until we have enough posterior samples. Suppose we have a total of M posterior samples; then, we can estimate the parameter by calculating their posterior means and also perform subsequent inference. To ensure the parameters in the model identifiable, \(\Delta _{ij}\) will be centralized such that \(\sum _{ij}\Delta _{ij}=0.\)
Model selection and goodnessoffit check
With respect to model selection, we used the deviance information criteria (DIC), a Bayesian analog of the AIC, to choose the best candidate model that achieves the optimal balance between model fit and model complexity [31]. Let \(\varvec{\theta }\) denote the whole parameter space of the model. Spiegelhalter et al. (2002) [31] proposed the following DIC under the Bayesian framework, which combines the likelihood and the posterior distributions:
where \(\bar{\varvec{\theta }}\) is the posterior mean of \(\varvec{\theta }\). Noting that \(\zeta(\bar{\varvec{\theta }})=2\log L(\bar{\varvec{\theta }})+C\) is the deviance of the model under our posterior estimates, where C is a constant on the DIC; and \(p_D=\bar{\zeta}\zeta(\bar{\varvec{\theta }})\) is the difference of the mean of the deviance under the posterior distribution and the deviance under the posterior mean, reflecting the effective number of parameters to indicate the model complexity or degrees of freedom. Notably, DIC serves as a decent approximation of AIC when working with negligible prior information. Additionally, graphical and secondary assessments such as CoxSnell or martingale residuals are valuable tools for gaining further insights into the goodnessoffit.
Motivated by the analysis using the PCR of PC, we consider the following three candidate models:

Model 1 (M1): \(\log (T_{ijk})=\mu +\varvec{x}_{ijk}^{T}\varvec{\beta }+\xi z_{ijk}+\omega _{i}+\sigma \epsilon _{ijk}\);

Model 2 (M2): \(\log (T_{ijk})=\mu +\varvec{x}_{ijk}^{T}\varvec{\beta }+\xi z_{ijk}+ \gamma _i z_{ijk}+\omega _{i}+\sigma \epsilon _{ijk}\);

Model 3 (M3): \(\log (T_{ijk})=\mu +\varvec{x}_{ijk}^{T}\varvec{\beta }+\varvec{z}_{ik}^{T}\varvec{\xi }+\varvec{\eta }_{i}^{T}\varvec{\gamma }_i+\omega _i+\sigma \epsilon _{ijk}.\)
In M1 and M2, \(z_{ijk}\) is the year when the patient was enrolled in the study. These models treat the year as a continuous variable. In M3, \(\varvec{z}_{ik}\) is the vector of the year indicators for the year when the patient was enrolled in the study, treating year as a categorical variable. M1 is a spatial model that does not account for the temporal cohort effect variation. In M2, a random slope is added to account for the temporal effect, where \(\varvec{\Delta }_i=\{\omega _i,\gamma _i\}\). For M3, we have a spatialtemporal intercept for each cohort and each county \(\varvec{\Delta }_i=\{\omega _{i}, \gamma _{ij}, i=1,\dots ,I; j=1,\dots ,J\}.\) We chose the optimal model for survival analysis of the PCR of PC depending on the DIC, where the smaller the DIC, the better the model. The procedures for parameter estimation and inference and for model diagnosis were programmed in R and invoke C functions; these are available upon request from the authors.
Simulation study
Simulation setups
In order to evaluate the performance of our proposed method and the selection accuracy of the DIC, we mimicked the PCR data structure and conducted extensive simulation studies under different scenarios. The data were generated using the following AFT models:

Scenario 1 (S1): \(\log T_{ijk}=\mu + x_{1,ijk}+0.5x_{2,ijk}+\omega _i+0.5(j1)+\epsilon _{ijk}\);

Scenario 2 (S2): \(\log T_{ijk}=\mu +x_{1,ijk}+0.5x_{2,ijk}+\omega _i+0.5(j1)^{0.5}+\epsilon _{ijk}\);

Scenario 3 (S3): \(\log T_{ijk}=\mu + x_{1,ijk}+0.5x_{2,ijk}+\omega _{i}+\gamma _{ij}+\epsilon _{ijk}\);

Scenario 4 (S4): \(\log T_{ijk}=\mu +x_{1,ijk}+0.5x_{2,ijk}+\omega _{i}+\varvec{z}^T_{ik}\varvec{\xi }+\epsilon _{ijk}\), where \(\varvec{\xi }=(0, 0.5, 0.5, 0.6, 0.8)\).
Here, \(x_{1,ijk}\) is a continuous variable generated from a standard normal distribution N(0, 1); \(x_{2,ijk}\) is a binary variable following the Bernoulli distribution, Bernoulli(0.5); \(x_{1,ijk}\) and \(x_{2,ijk}\) vary between different subjects, cohorts, and counties; \(\varvec{z}_{ik}=(z_{i1k}, \ldots , z_{iJk})\) with \(z_{ijk}\) as the indicator for the \(j^{th}\) temporal cohort. In S3, \(\omega _{ij}=\omega _{i}+\gamma _{ij}\) is generated iteratively considering the dependency between adjacent cohorts with \(\omega _{ij}=2\omega _{i,j1}+\zeta\), where \(\zeta\) is generated by a standard normal distribution N(0, 1). We assume that the error term \(\epsilon _{ijk}\) follows a standard extreme value distribution.
To evaluate the influence of the censoring rate on the performance of our proposed candidate models, the censoring time was generated from a uniform distribution, Unif(0, 1) for a rough average censoring rate of 20% across Monte Carlo data, and Unif(0, 80) for a censoring rate of around 80%. Additionally, to mimic the PCR and generate similar data, we also considered the case with \(x_{1,ijk}\) as a countylevel risk factor to evaluate our methods; in other words, \(x_{1,ijk}\) varies only across counties but not across subjects and temporal cohorts, thus \(x_{1,ijk}=x_{1,i}\).
As there are 67 counties in PA, we considered the same number of counties in our setup, thus \(i=1,\ldots , 67\). In addition, in order to reduce the computing burden, we assumed there were five temporal cohorts for each county \((J=5)\). In each county and temporal cohort, the number of patients, \(K_{ij}\), was determined by the percentage of cancer cases in the real PC data from the PCR. Thus, we ensured that there were at least five patients in every county each year. For results summary, we generated 1,000 Monte Carlo data for each scenario, and for each data simulation, we fitted the following Weibull AFT models (M1, M2, and M3) under the Bayesian framework:

M1: \(\log (T_{ijk})=\mu +\beta _1 x_{1,ijk}+\beta _2 x_{2,ijk}+\omega _{i}+\sigma \epsilon _{ijk}\);

M2: \(\log (T_{ijk})=\mu +\beta _1 x_{1,ijk}+\beta _2 x_{2,ijk}+\xi z_{ijk}+\gamma _i z_{ijk}+\omega _{i}+\sigma \epsilon _{ijk}\);

M3: \(\log (T_{ijk})=\mu +\beta _1 x_{1,ijk}+\beta _2 x_{2,ijk}+\varvec{z}_{ik}^{T}\varvec{\xi }+\varvec{\eta }_{i}^{T}\varvec{\gamma }_i+\omega _{i}+\sigma \epsilon _{ijk}.\)
Note that \(\varvec{\eta }_i\) and \(\varvec{\gamma }_i\) are defined the same as in “Model selection and goodnessoffit check” section, and the matrices \(\varvec{D}_\omega\) and \(\varvec{C}\) (the adjacency matrix) in the MCAR and CAR priors are generated for the PA counties. For parameter estimation, the posterior mean was calculated for each parameter using 1,000 MCMC samples, generating a total of 2,000 posterior samples with the first 1,000 samples discarded during the burnin period. It should be noted that more MCMC samples might be necessary depending on model convergence diagnosis; however, in our numerical studies, satisfactory results were achieved under these settings (see results below). In summary, for the Monte Carlo replications, the bias (Bias) and standard deviation (SD) of parameter estimates, mean squared error (MSE), and the selection probabilities of different models based on the DIC are reported.
Simulation results
The summary statistics for the estimates of primary parameters \(\hat{\beta }_1\) and \(\hat{\beta }_2\) under different scenarios are presented here. The results for all scenarios with a censoring rate of 20% are shown in Table 1. The bias values \(\hat{\beta }_1\) and \(\hat{\beta }_2\) under M2 were the smallest among all candidate models, whereas M1 had the worst performance with the largest bias, especially for S3, followed by S1. With regard to MSE, M2 still showed the best performance, having the smallest value across different scenarios. Notably, the performance of M3 was comparable with that of M2 in terms of having negligible bias and variability under the S4 scenario.
Figures 2 and 3 show that in this algorithm, \(\beta _1\) and \(\beta _2\) quickly converge to the true value. The posterior samples are not quite correlated in the autocorrelation function (ACF) plot. Also, in the trace plot, the posterior samples vibrate around the true value of \(\beta _1\) and \(\beta _2\).
Furthermore, when the censoring rate increased to 80% (results shown in Table 2), the majority of bias and MSE values across different scenarios increased compared with those in Table 1; however, M2 still showed satisfactory results in terms of having the smallest bias and MSE among all candidate models, except in the case of S4, under which the bias values (\(\hat{\beta }_1\) and \(\hat{\beta }_2\)) were relatively large, although the MSEs remained satisfactory (0.002 and 0.005 for M2; 0.003 and 0.005 for M3). In addition, we checked the model fitting for the scenarios with \(x_1\) as the only risk factor varying across counties; the results are presented in Table 3. Compared with the results shown in Table 1, the bias of \(\hat{\beta }_1\) in M2 tended to be much larger; under S4 in particular, a substantial bias was detected compared with the M3 case, although the MSEs seemed to be comparable. With regard to model selection under different scenarios, the selection probabilities based on the DIC criterion are provided in Table 4. Model M2 was predominantly selected regardless of the censoring rate; however, when underlying risk factors were hierarchical under scenario S4, which has spatial and temporal heterogeneity but the same linear temporal trend withincounty, M3 had the highest selection rate. Noting that we also performed posthoc analysis by adopting loglogistic AFT for model fitting and comparing the performance to the other existing approaches, and the results of these additional analyses are presented in the Supplementary Material.
Data application
We applied our proposed models to the PCR PC data for the years 2004–2014, with men aged \(\ge 40\) years with a primary diagnosis of PC and Gleason score (GS) of \(\ge 6\) included for analysis. PC cases with unknown GS were also considered if the tumor stage was \(\ge\) T3. The survival outcome of interest was the time to allcause mortality (months). Several important risk factors were considered: 1) serum PSA; 2) age at diagnosis with a threshold of 65 years old (1 if age \(<65\), 0 otherwise); 3) insurance status (0=yes; 1=no); 4) Appalachian region, a geopolitical designation defined by the Appalachia Regional Commission that roughly follows the spine of the Appalachian mountains (https://www.arc.gov/appalachian_region); 5) disease aggressiveness: less aggressive PC (GS 6 or GS 7 \([3+4]\), tumor stage T1T2, and no distant metastasis), or more aggressive PC (GS \(\ge 7\) \([4+3]\), tumor stage \(\ge\)T3, or distant metastasis); 6) treatment at diagnosis: primary site surgery only, radiation only, primary site surgery and radiation, or other/unknown; 7) race: white, black, or other/unknown; and 8) year of diagnosis. Regarding the age variable, we dichotomized age using a threshold of 65 years due to clinical interest and significance [32,33,34,35], and also the violation of the PH assumption (Table S.3 in the Supplementary Material). The PA Department of Health and the Institutional Review Board of the PA State University College of Medicine approved the protected data and the study. Based on the empirical performance of the candidate models in the simulation studies, we only fitted two models, M2 and M3, for data analysis and comparison of results.
Of the 143,499 PC cases reported with a primary diagnosis in the 2004–2014 PCR, 94,274 eligible men, aged 40 to 105 years, were identified for the final analysis. These data were analyzed in our previous work, where more details of the summary statistics for demographic and clinical characteristics can be found [6, 8]; however, these previous studies did not consider temporal heterogeneity together with spatial information in the modeling, nor were any model diagnoses or comparisons conducted. Here, we performed a secondary analysis of the PCR to further explore the distribution of newly diagnosed PC cases and their associated risk factors in PA by using advanced statistical methods via AFT regression models. As shown in Fig. 4, there were significant differences in the survival curves stratified by several risk factors (\(p<0.001\)). For example, patients who were black, had more aggressive disease, were aged 65\(+\), or did not receive either surgery or radiation had higher risk of mortality. Based on an empirical check of the different candidate models in the simulations, we considered spatialtemporal AFT models M2 and M3 with adjustment of both spatial and temporal heterogeneity for this PCR data application; two different distribution assumptions, the Weibull and loglogistic distributions, were also considered. For each model, 2000 samples were drawn in the burnin period and another 20000 samples were drawn from the posterior distribution.
The results for fixedeffect parameters are summarized in Table 5. Note that the estimates were directly associated with the natural logarithm of time, with a negative value indicating a decrease in survival time and a positive value indicating an increase in survival time. There were some differences in the magnitude of parameter estimates and the associated significance among different candidate models. Based on the DIC criterion, M3 with the loglogistic distribution assumption was the best candidate model. In addition, per reviewers’ suggestion, we conducted sensitivity analysis with varied values of hyperparameters for M3. Specifically, several combinations of priors for \(\sigma ^2\) and \({\textbf {R}}\) were considered (with more details in the Supplementary Material). Based on the DIC, we added the best candidate model M3 with the priors \(\sigma ^2\sim IG(0.001, 0.001)\) and \({\textbf {R}}=Diag{10,\dots ,10}\), denoted by M3 into Table 5. Overall, a significantly lower PCspecific survival time was observed for patients who were black, aged 65 and above, not insured, and had higher serum PSA or more aggressive PC at the time of diagnosis. A longer PCspecific survival time was observed for patients with any definitive PC treatment compared with those without either primary site surgery or radiation treatment. For instance, according to M3\(^{*}\) (the best candidate), the average survival time of PC cases who received both surgery and radiation was 3.773=exp(1.328) (95% CI 3.6954.536) times higher than that of those who did not receive either. When comparing models M3\(^{*}\) to M3 under the loglogistic distribution assumption, one major difference is observed in the effect of the countylevel risk factor, Appalachia. Notably, along with relatively informative priors, Appalachian regions exhibit longer survival times compared to nonAppalachian regions, which aligns with some evidence from our prior work (i.e., men residing in rural Appalachia demonstrated the lowest rates of aggressive prostate cancer and mortality) [8]. Nevertheless, this finding remains contentious and necessitates further exploration. Additionally, we conducted further goodnessoffit checks using CoxSnell residuals, with more detailed information available in the Supplementary Material (Fig. S.2). This analysis demonstrates superior performance after adjustment for spatialtemporal dependency, although we acknowledge some deviations that merit further investigation.
Discussion
In this work, we utilized PC data from the PCR to identify the best candidate model for inference. Owing to the unique features of individuallevel cancer registry data, with spatial and temporal dependency and a hierarchical structure of risk factors, advanced statistical approaches via spatialtemporal hierarchical modeling were necessary. Although spatial and temporal models for survival analysis are described in the literature, they have limitations or were not directly applicable or accurate in this context: for instance, some approaches can only be used for aggregate data analysis such as countylevel or any other administrative unitlevel data; some only consider spatial correlations but ignore temporal information; and some use Cox PH regression regardless of the violation of the PH assumption. Based on our extensive simulation studies, mimicking the PCR and considering different candidate models to incorporate spatial and/or temporal heterogeneity, we identified an optimal model for cancer registry survival analysis with individuallevel data under different scenarios. We found that in most cases, model M2 with spatial and temporal random effects and year of diagnosis as a continuous variable can achieve satisfactory performance, with the smallest bias and variability in parameter estimates. However, when there was substantial variation in spacetime and also a hierarchical structure of risk factors across individuals and geographical clusters (i.e., county), model M3 was preferred. For goodnessoffit check and model selection, the DIC is the most prevalent metric for evaluation. However, incorporating more graphical assessments like CoxSnell residuals could provide additional insights. Moreover, based on sensitivity analysis, we anticipate that more informative priors might lead to further improvements. However, further exploration is required to ensure the justification of these priors.
These proposed models and the associated recommendation guidelines for model fitting under this context will benefit not only PC research but also studies of other cancers or diseases in other populationbased registry data with similar data structures, including the SEER data, and the National Program of Cancer Registries. Furthermore, the Bayesian framework adopted here can be easily implemented in statistical software, and more interpretable results can be obtained from posterior summaries. Noting that if the model and data generating mechanism is unknown, the MCMC algorithm might take longer to converge depending on the model complexity and the data. Thus, it is important to perform convergence diagnosis and choose appropriate numbers of MCMC and burnin samples. Nonetheless, there are several limitations to the proposed approach. Currently, we primarily focus on the CAR distribution, which incorporates spatial dependencies based on county contiguity. However, alternative weighting schemes such as distancebased or graphicbased weights could potentially offer greater insight. Additionally, enhancing the predictive accuracy of our approach could involve selecting more informative priors or other distribution assumptions tailored to specific data applications. Moreover, our current model only incorporates baseline risk factors except the diagnosis year. However, if additional (timevarying) risk factors are available for analysis, our modeling framework should be capable of accommodating them, albeit with additional effort required for algorithm extension and model diagnosis.
There are several topics for future work. Here, we mainly focused on parametric AFT models; however, the distribution assumption could be violated in practice, and thus more robust models such as the semiparametric AFT model (i.e., rankbased estimation) would be of research interest. Besides, here we mainly analyzed overall survival; however, if there is clinical interest in cancerspecific survival, the competing risks due to other causes need to be taken into account to obtain unbiased estimates. Models extended from the Fine and Gray model or based on a joint approach with shared random effects modeling could be developed in future studies. In addition, with regard to parameter estimation and inference based on the Bayesian technique, advanced algorithms (i.e., blocked Gibbs sampling, slice sampling) could be adopted to further reduce the computing burden.
Conclusion
This study presents several advanced spatialtemporal models for identifying risk factors for allcause mortality in newly diagnosed PC patients from the PCR 20042014, where heterogeneity between subjects and the structure of dependency in geographic regions (county) and time (data collection year) are simultaneously considered. Simulation data indicated that under such context, the model with spatial and temporal random effects and the year of diagnosis as a continuous variable performs the best among candidate models, with model diagnosis and assessment based on the DIC. Additionally, the application of the model on our motivation data on PC was also evaluated, leading to more valid inference on risk factors’ effects and identifying substantial spatialtemporal variation.
Availability of data and materials
The Pennsylvania Cancer Registry (PCR) data: The PCR data is a populationbased dataset including all newly diagnosed prostate cancer patients recorded by the Department of Health, Pennsylvania. The PCR data follows standardized data acquisition protocols to ensure that the individual reports include the same information in the same format, which can then be pooled in centralized databases. The PCR data can be made publicly available upon obtaining approval from the Pennsylvania Department of Health by signing a data use agreement for the data access application. The direct persistent weblink for the PCR data request is https://www.health.pa.gov/topics/ReportingRegistries/CancerRegistry/Pages/Cancer%20Registry.aspx.
Computing package: The computing package containing the code to perform spatialtemporal Bayesian accelerated failure time models described in the article. The package is written in R with callable C functions for computing efficiency, and it is available on GitHub for public use. Please refer to the website https://github.com/zli141/sptime.
Abbreviations
 PH:

Proportional hazards
 PC:

Prostate cancer
 AFT:

Accelerated failure time
 SEER:

Surveillance, Epidemiology, and End Results Program
 PA:

Pennsylvania
 PCR:

Pennsylvania Cancer Registry
 CAR:

Conditional Autoregressive
 MCAR:

Multivariate Conditional Autoregressive
 DIC:

Deviance information criteria
 SD:

Standard deviation
 MSE:

Mean squared error
 CR:

Censoring rate
 ACF:

Autocorrelation function
 GS:

Gleason score
References
Munjal A, Leslie SW. Gleason score. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2020.
Yao N, Alcalá HE, Anderson R, Balkrishnan R. Cancer disparities in rural Appalachia: incidence, early detection, and survivorship. J Rural Health. 2017;33(4):375–81.
Zhang J, Lawson AB. Bayesian parametric accelerated failure time spatial model and its application to prostate cancer. J Appl Stat. 2011;38(2):591–603.
Wang M, Matthews SA, Iskandarani K, Li Y, Chinchilli VM, Zhang L. Spatialtemporal analysis of prostate cancer incidence in Pennsylvania for years 2000–2011. Geospatial Health. 2017;12(2):611.
Bluethmann SM, Wang M, Wasserman E, Chen C, Zaorsky NG, Hohl RJ, et al. Prostate cancer in Pennsylvania: The role of older age at diagnosis, aggressiveness, and environmental risk factors on treatment and mortality using data from the Pennsylvania Cancer Registry. Cancer Med. 2020;9(10):3623–33.
McDonald A, Wasserman E, Lengerich EJ, Raman J, Geyer N, Hohl R, et al. Prostate Cancer Incidence and Aggressiveness in Appalachia versus NonAppalachia Populations in Pennsylvania by UrbanRural Regions, 2004–2014. J Cancer Epidemiol Biomarkers Prev. 2020;29(7):1365–73.
Wang M, Chi G, Bodovski Y, Holder SL, Lengerich EJ, Wasserman E, et al. Temporal and spatial trends and determinants of aggressive prostate cancer among Black and White men with prostate cancer. J Cancer Causes Control. 2019;31(1):63–71.
Wang M, Wasserman E, Geyer N, Carroll R, Zhao S, Hohl R, et al. Spatial patterns in prostate cancerspecific mortality in Pennsylvania and its catchment area using Pennsylvania cancer registry data, 2004–2014. BMC Cancer. 2020;20(1):394.
Cox DR. Regression models and lifetables. J R Stat Soc Ser B Methodol. 1992;34(2):187–220.
Wei LJ. The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Stat Med. 1992;11(14–15):1871–9.
Ewees AA, Algamal ZY, Abualigah L, Alqaness M, Yousri D, Ghoniem RM, et al. A Cox ProportionalHazards Model Based on an Improved Aquila Optimizer with Whale Optimization Algorithm Operators. Mathematics. 2022;10(8):1273.
Bibani AA, Algamal ZY. Survival Function Estimation for Fuzzy Gompertz Distribution with neutrosophic data. J Int J Neutrosophic Sci. 2023;21(3):137–42.
Schrodle B, Held L. Spatiotemporal disease mapping using INLA. Environmetrics. 2010;22:725–34.
NietoBarajas LE. Bayesian regression with spatiotemporal varying coefficients. Biom J. 2020;62(5):1245–63.
Carlin BP, Banerjee S. Hierarchical multivariate CAR models for spatiotemporally correlated survival data. Bayesian Stat. 2003;7:45–63.
Banerjee S, Carlin BP. Semiparametric spatiotemporal frailty modeling. Environmetrics Off J Int Environmetrics Soc. 2003;14(5):523–35.
Zhou H, Hanson T, Zhang J. Generalized accelerated failure time spatial frailty model for arbitrarily censored data. Lifetime Data Anal. 2017;23(3):495–515.
Onicescu G, Lawson AB, Zhang J, Gebregziabher M, Wallace K, Eberth JM. Bayesian accelerated failure time model for spacetime dependency in a geographically augmented survival time. Stat Methods Med Res. 2017;26(5):2244–56.
Carroll R, Lawson AB, Jackson CL, Zhao S. Assessment of spatial variation in breast cancerspecific mortality using Louisiana SEER data. Soc Sci Med. 2017;193:1–7.
Carroll R, Lawson AB, Zhao S. Temporally dependent accelerated failure time model for capturing the impact of events that alter survival in disease mapping. Biostatistics. 2019;20(4):666–80.
Wang S, Zhang J, Lawson AB. A Bayesian normal mixture accelerated failure time spatial model and its application to prostate cancer. Stat Methods Med Res. 2012;25:793–806.
Hurtado R, Sandra M, Dipak KD. A transformation class for spatiotemporal survival data with a cure fraction. Stat Methods Med Res. 2016;25(1):167–87.
Sharmin S, Khan HR. Analysis of unobserved heterogeneity via accelerated failure time models under Bayesian and classic approaches. 2017;26(5):2244–56. arXiv preprint arXiv:1709.02831
Carroll R, Zhao S. Trends in colorectal cancer incidence and survival in Iowa SEER data: the timing of it all. Clin Colorectal Cancer. 2019;18(2):e261–74.
Cressie N, Chan NH. Spatial modeling of regional variables. J Am Stat Assoc. 1989;84(406):393–401.
Besag J. Spatial interaction and the statistical analysis of lattice systems. J R Stat Soc Ser B Methodol. 1974;36(2):192–236.
Hanson TE, Jara A, Zhao L. A Bayesian semiparametric temporallystratified proportional hazards model with spatial frailties. Bayesian Anal. 2011;6(4):1–48.
Cai B, Lawson AB, Hossain M, Choi J, Kirby RS, Liu J. Bayesian semiparametric model with spatiallytemporally varying coefficients selection. Stat Med. 2013;32(21):3670–85.
Casella G, George EI. Explaining the Gibbs sampler. Am Stat. 1992;46(3):167–74.
Chib S, Greenberg E. Understanding the MetropolisHastings algorithm. Am Stat. 1995;49(4):327–35.
Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. J R Stat Soc Ser B Stat Methodol. 2002;64(4):583–639.
L RS, FernándezNavarro P, LópezAbente G, Nuñez O, Fernández de LarreaBaz N, JimenezMoleón JJ, et al. Different spatial pattern of municipal prostate cancer mortality in younger men in Spain. PLoS ONE. 2019;14(1):e0210980.
Loeb S, Bjurlin MA, Nicholson J, Tammela TL, Penson DF, Carter HB, et al. Overdiagnosis and overtreatment of prostate cancer. Eur Urol. 2014;65(6):1046–55.
Moyer VA. Screening for prostate cancer: US Preventive Services Task Force recommendation statement. Ann Intern Med. 2012;157(2):120–34.
Wolf AM, Wender RC, Etzioni RB, Thompson IM, D’Amico AV, Volk RJl, et al. American Cancer Society guideline for the early detection of prostate cancer: update 2010. CA Cancer J Clin. 2010;60(2):70–98.
Acknowledgements
We are grateful for the Department of Health at Pennsylvania to provide the access to the Pennsylvania cancer registry data.
Funding
The corresponding author, Ming Wang, gratefully acknowledge the Startup support from Department of Population and Quantitative Health Sciences at Case Western Reserve University.
Author information
Authors and Affiliations
Contributions
MW and ZL developed the models, had full access to the data and analyzed the data and drafted the manuscript. MW conceptualized the study. JL and LJZ implemented the algorithm and assisted in simulation studies. YL and LLZ reviewed the method development and provided comments on the simulation studies and data application. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Wang, M., Li, Z., Lu, J. et al. Spatialtemporal Bayesian accelerated failure time models for survival endpoints with applications to prostate cancer registry data. BMC Med Res Methodol 24, 86 (2024). https://doi.org/10.1186/s1287402402201w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1287402402201w