Towards integrated surveillance of zoonoses: spatiotemporal joint modeling of rodent population data and human tularemia cases in Finland

Background There are an increasing number of geo-coded information streams available which could improve public health surveillance accuracy and efficiency when properly integrated. Specifically, for zoonotic diseases, knowledge of spatial and temporal patterns of animal host distribution can be used to raise awareness of human risk and enhance early prediction accuracy of human incidence. Methods To this end, we develop a spatiotemporal joint modeling framework to integrate human case data and animal host data to offer a modeling alternative for combining multiple surveillance data streams in a novel way. A case study is provided of spatiotemporal modeling of human tularemia incidence and rodent population data from Finnish health care districts during years 1995–2012. Results Spatial and temporal information of rodent abundance was shown to be useful in predicting human cases and in improving tularemia risk estimates in 40 and 75% of health care districts, respectively. The human relative risk estimates’ standard deviation with rodent’s information incorporated are smaller than those from the model that has only human incidence. Conclusions These results support the integration of rodent population variables to reduce the uncertainty of tularemia risk estimates. However, more information on several covariates such as environmental, behavioral, and socio-economic factors can be investigated further to deeper understand the zoonotic relationship.


Background
Disease risk mapping is important for the understanding of the spatial epidemiology of infectious diseases. In most cases and even for multi-host diseases such as zoonoses, risk estimation has been conducted in a univariate fashion based on human case data alone. Modeling of multivariate health data, informed by multiple streams of geo-coded information, allows observation of concurrent patterns among data streams and conditioning on one another. As a result, multivariate methods can deliver greater statistical power and lead to more precise risk estimation and enhanced event detection. Specifically, for zoonotic diseases, knowledge of spatial and temporal patterns of the animal host could inform incidence in humans.
Integration of data and analyses, whether of population or health related variables, has been suggested to improve zoonoses surveillance accuracy and efficiency [1,2]. Integration appears more feasible for endemic zoonoses, and for those with domesticated animals as source, given the likely greater availability of animal health data. For zoonoses with a non-domesticated animal source (e.g. sylvatic yellow fever, tularemia), availability of animal health data is likely to be a limiting factor towards integration and alternative animal data sources must be sought.
Tularemia is an infectious disease caused by an intracellular bacterium, Francisella tularensis. The disease is endemic in North America and parts of Europe, with recurrent outbreaks in Sweden and Finland [3,4]. Francisella tularensis has a wide range of hosts with transmission most commonly via arthropod vectors [5]. Rodents could play a role in the zoonotic transmission of the disease after findings of a relationship between vole population cycles and human tularemia incidence in Finland [6] and Sweden [7]. Specifically in Finland, rodent population dynamics displayed a spatiotemporal relationship with human tularemia cases, such that human case numbers peaked one year after peak rodent densities [6]. Similar findings, from studies of tularemia outbreaks, indicate that high rodent densities might relate to occurrences in humans [8][9][10][11].
This work explores the application of spatiotemporal joint models to concurrent animal and human geo-referenced data sources in an effort to explain possible patterns between the distribution and/or the abundance of the animal host and human disease.. The proposed methodology is evaluated on its performance in predicting human disease risk and improving risk estimation in a case study of tularemia human incidence and rodent population data in Finland. Not only our method contains methodological novelty with potential applications in spatial epidemiology, it also helps to reveal a disease pattern in the case study which was not considered in previous studies.

Data sources
A complete description of the rodent population and human tularemia incidence data is available from earlier reports [6]. Briefly, data on rodent population levels, predominantly bank voles (Myodes glareolus) and field voles (Microtus agrestis), were collected across Finland by the Natural Resources Institute Finland and categorized into three population levels: decline, increase, and peak [12]. Human tularemia cases were reported as laboratory-confirmed to the National Infectious Disease Register, kept by the National Institute for Health and Welfare of Finland. Both human cases and rodent data were aggregated into 20 Finnish healthcare districts over the period 1995-2012 [6]. An indicator to quantify the influence of rodent population levels on human incidence is developed for each health district. Plots of human cases and binary rodent status for the 20 Finnish health districts in the period 1995-2012 show a one-year lagged increase in the number of human tularemia cases after rodent population peaks for certain districts and years (Fig. 1). A similar pattern was found between human cases and the categorical rodent population status (Fig. 2). These support the choice of spatiotemporal models which will be developed in the next section.

Statistical methodology
We propose a Bayesian framework to jointly analyze rodent population status and human case incidence. We assume that the human cases are associated with rodents' status through a latent structure. Let human cases (counts), h it , at health district i and time t follow a Poisson distribution with mean = e i θ it where e i is the expected number of human cases in the ith health district (presumably constant across the years) and θ it is the relative risk at the ith health district and year t There are a number of ways to calculate the expected rate. In this paper, the expected rates, e i , are calculated as the average case count at area i over the time period as e i ¼  (T = 8 years). Human population data was obtained from the Finnish Population Register Centre [13]. However, the population variation between regions was found to be limited and combined with the low overall rate of the disease, it was decided that a time averaged rate would be appropriate in this case.
A simple approach to jointly model human incidence and rodent population data is to consider a 2-level rodent population indicator as in [6]. As a binary variable, we denote r it = 0 if the number of rodents declined and r it = 1 if the number of rodents is at peak or increased. The 2-class rodent status is then assumed to follow a Bernoulli distribution with parameter p it being the probability of rodent for the i th health district and year t. To specify the parameters in the joint likelihoods, θ it for humans and p it for rodents, linear predictors are decomposed additively into spatial, and space-time interaction random effects as follows α r , α h are the overall mean levels for rodent and human respectively, and assumed to have zero-mean Gaussian prior distributions. The latent variables u h i ; u r i ; v h i ; v r i are included to model non-temporal background variation with spatial and non-spatial prior distributions. The spatial structure for u h i ; u r i follow an intrinsic conditional autoregressive (ICAR) [14] model and the non-spatial distribution for v h i ; v r i is assumed to be zero-mean Gaussian prior distribution. A Gaussian distribution with zero mean is assumed for λ h t ; λ r t at t = 1 and an autoregressive prior distribution is assumed for λ h t ; λ r t at t > 1 which allows for a type of nonparametric temporal effect. δ h it ; δ r it represent the temporal trend for each health district at year t. We also assume that the space-time random effects of human and rodent are proportional with one-year lag. This is supported by the finding suggested in [6] that 1-year temporal lag effect can be beneficial to predict human tularemia outbreaks.
To model unobserved ecological effects associated with vole cycles at the five boreal zones in Finland (Southern Finland, Southwestern and Inland Finland, Eastern Finland, Northern Finland, and Lapland) [13], two additional parameters u sr k ; v sr k are included as the spatial and non-spatial contextual variables for regional state level k. δ r it ; β p i ; β n i are assumed to follow a zero-mean Gaussian prior distribution and the uniform distribution on (0,10) is used to model all standard deviation parameters [15].
Although there is some evidence that a 2-level rodent status could have a lagged predictive ability on human tularemia [6], we also want to extend our consideration to include the original three rodent levels in the joint modeling and assume a categorical likelihood for rodent population status. A specification for multiple categories of rodent status can be defined as where j = 1 indicates the declining rodent population level, j = 2 is the increasing level, and j = 3 is the rodent's population at peak, and δ r j;it $ Nð0; τ −1 δ r Þ. The other prior distributions for the random effects terms are assumed the same as the 2-level model.
We further assume that the space-time random effects of human and rodent are proportional with one-year lag. That is δ h it α δ r r it−1 ;it−1 , i.e. δ h it ¼ β i;r it−1 δ r r it−1 ;it−1 where β i;r it−1 is the proportional parameter. This specification is developed to examine the lagged effect of rodent status, δ r r it−1 ;it−1 , at level j = r it-1 in health district i at time t-1on the number of human cases through the space-time interaction term,δ h it .

Model evaluation
To evaluate the models, we use two goodness of fit measures: the Deviance Information Criterion (DIC) [16,17] where the effective number of parameters is estimated in terms of deviance's variance, and the Watanabe-Akaike information criterion (WAIC) [18][19][20]. Both measures are computed under the likelihood of human data to compare the models' fit with and without the contextual variables. We also compare the posterior standard deviations of θ it from the two models (binary and categorical) with the rodent data, with those from a model based only human data, to examine the benefits of incorporating animal data in surveillance. Results are obtained from 10,000 posterior samples using WinBUGS software after a burn-in period of 10,000 draws. To assess the mixing of posterior samplers, we adopt Gelman's R b statistics proposed in [18,21] for multiple chain convergence and converged chains should have the value of R b approximately 1.

Results
The histograms of R b estimates of θ it from posterior sampler under the models are displayed in Fig. 3. The R b estimates under all the models are approximately or less than 1.005 which indicates the chains converge to a posterior distribution. ing the best fit to the data. To provide evidence of the benefits from incorporating rodent information into the model, the posterior estimates of standard deviation (SD) of θ it from both binary and polytomous models are compared and presented in Table 2 for each area over the time period. The SD estimates of θ it with rodent's information incorporated are smaller than those from the model that has only human incidence. This result supports the integration of rodent population variables to reduce the uncertainty of tularemia risk estimates.
To help assess the predictive performance of rodent abundance data on the occurrence of human cases, we develop a metric on the 0 to 1 scale, namely the degree of positive indicator (DP) for each health district as The DP indicator is derived from the binary model without the contextual factors as this was the model with the lowest DIC and WAIC. High values of DP (close to 1) would indicate health districts with a high number of human tularemia cases in the current year, given increasing or at peak rodent populations in the past year. DP can be interpreted in a similar fashion to the sensitivity of a diagnostic test. High values of DP would suggest good predictive value of the rodent data on the occurrence of human tularemia cases. Year-specific modeling was not considered due to insufficient data. Thus positive dependence values represent an averaged effect of rodent populations on human incidence across all the years (Table 3). There are 8 (40%) health districts, mostly on the west and south of the country, with mean DP values larger than 0.8 and the lower 95% credible interval above 0.5 (Fig. 4).

Discussion
Other modifications are possible. For example, in joint modeling human and rodent data we could consider human cases being dependent on rodent population through p it as a categorical covariate. For spatial unit, smaller or administrative areas different from health districts may have led to more discriminatory findings (or not given increased noise from smaller units). Similarly, more informative data on rodent population levels, e.g. rodent densities as in other studies [11] might have resulted in more descriptive models. Rodent populations fluctuate with a highly varying amplitude, which means that abundances may vary substantially from one peak to the next, even within the same regions [12]. If tularemia transmission to humans is a phase and density dependent process, as it most likely is [6], variation in vole abundance during successive peaks may reduce the predictive value of models employed here. Subsequent studies must explore the incorporation of data on the precise location of the rodent trapping sites through the use of some form of interpolation [22]. In addition to host population changes, environmental factors also seem to impact on the occurrence of tularemia outbreaks [23,24]. Incorporation of evidence on mosquito distribution (not purposely captured at this moment in Finland), rainfall and water bodies into our models would be straightforward.

Conclusions
Space-time proximity and contact patterns between humans and animals play a central role in infection risk. In this research, we attempted to assess the surveillance relevance of regularly collected rodent population data to i) improve tularemia risk estimates and ii) inform  To that effect, we developed binary and polytomous architectures to jointly model human incidence and rodent status with one-year lag. Our results returned a heterogeneous picture but for many health districts rodent population status was relevant to the occurrence of human tularemia cases. We have shown that the incorporation of rodent population data led to an improvement in the accuracy of human risk estimates in 15 (75%) health districts, compared to models only considering human tularemia cases. Furthermore, our purposely built indicator (DP) showed that in 8 (40%) of the health districts, increasing and at-peak rodent populations were robust predictors of human tularemia cases in the following year. However, for few districts (e.g. Länsi--Pohja and Kainuu) where the model based only on human data leads to more precise estimates of incidence, these areas also have the low values of positive indicator (DP) with the corresponding credible intervals crossing 0.5. This suggests that the distribution of zoonotic pathogens in animal and human populations spatially varies as we assumed according to local biotic and abiotic determinants. To conduct further investigation, we need more information on several covariates such as environmental, behavioral, and socio-economic factors. However, the platform proposed in this research can facilitate in identification of geographical areas that are potentially suitable for transmission. Our objective was to develop different models to combine multiple data sources already available, on animals and humans, to better inform the occurrence of zoonoses. The present work shows the utilization of animal population data (in the absence of animal health-related data on rodents) to inform human risk. Although different model parameterizations and, in particular, evidence on other putative predictors, as reported elsewhere [25], could contribute further  evidence to better inform human risk, our proposed methodology demonstrates its ability to quantify the association between the rodent status and human incidence. This is potentially useful in prediction of human outbreak when put in the health-policy perspective.