 Research article
 Open Access
 Published:
An R package for an integrated evaluation of statistical approaches to cancer incidence projection
BMC Medical Research Methodology volume 20, Article number: 257 (2020)
Abstract
Background
Projection of future cancer incidence is an important task in cancer epidemiology. The results are of interest also for biomedical research and public health policy. AgePeriodCohort (APC) models, usually based on longterm cancer registry data (> 20 yrs), are established for such projections. In many countries (including Germany), however, nationwide longterm data are not yet available. General guidance on statistical approaches for projections using rather shortterm data is challenging and software to enable researchers to easily compare approaches is lacking.
Methods
To enable a comparative analysis of the performance of statistical approaches to cancer incidence projection, we developed an R package (incAnalysis), supporting in particular Bayesian models fitted by Integrated Nested Laplace Approximations (INLA). Its use is demonstrated by an extensive empirical evaluation of operating characteristics (bias, coverage and precision) of potentially applicable models differing by complexity. Observed longterm data from three cancer registries (SEER9, NORDCAN, Saarland) was used for benchmarking.
Results
Overall, coverage was high (mostly > 90%) for Bayesian APC models (BAPC), whereas less complex models showed differences in coverage dependent on projectionperiod. Interceptonly models yielded values below 20% for coverage. Bias increased and precision decreased for longer projection periods (> 15 years) for all except interceptonly models. Precision was lowest for complex models such as BAPC models, generalized additive models with multivariate smoothers and generalized linear models with age x period interaction effects.
Conclusion
The incAnalysis R package allows a straightforward comparison of cancer incidence rate projection approaches. Further detailed and targeted investigations into model performance in addition to the presented empirical results are recommended to derive guidance on appropriate statistical projection methods in a given setting.
Background
Projection of future cancer incidence is an important task in cancer epidemiology. The results are of interest also for biomedical research and public health policy. In particular, cancer prevention and screening programs require reliable estimates of future cancer incidence to allow informed decisions on their design and to facilitate evaluations [1, 2]. Projections are often performed using longterm data (> 20 yrs) from populationbased cancer registries [3]. For shortterm data, there appears to be a lack of guidance which statistical approach to choose. The need to base projection models on relatively shortterm data is relevant e.g. for Germany, where aggregated data of cancer incidence on a national level is only available from 1999 on, as well as for many countries with newly established cancer registries. Even though it might be challenging to give general guidance on which projection approach to choose, software enabling comparison of multiple competing methods for a given research question might prove useful, but flexible, extensive and easy to use tools are missing.
A selection of previously applied projection models is outlined in [4]. Relatively simple approaches assuming constant rates were utilized [5, 6], as well as more complex ageperiod (AP) models formulated as generalized linear models (GLMs) with or without interaction effects [7,8,9]. Clements et al. use generalized additive models (GAMs) [10]. GAMs can include uni or multivariate smoothers in their linear predictors. An established model class for incidence projections based on longterm observation data are ageperiodcohort (APC) models, which additionally incorporate a cohort effect [11, 12]. Even though projections of APC usually yield robust results, the APC identification problem impairs direct interpretability of single effects [13, 14].
Projection models are often fitted within a classical maximum likelihood (ML) or restricted maximum likelihood (REML) framework [15,16,17]. Alternatively, a Bayesian framework may be used [18, 19]. Bayesian model estimation can be implemented using MarkovChain Monte Carlo (MCMC) methods, which are computationally intensive. A recently developed computationally far less demanding alternative is Integrated Nested Laplace Approximation (INLA) [20, 21].
GAMs usually incorporate splines to fit univariate trends or tensor product smoothers for multivariate trends (i.e. interactions between function of continuous variables). In the classical frequentist framework, such models can be fit e.g. using the mgcvpackage in R [22]. Uni and multivariate smoothers can directly be incorporated into the model formula, e.g. as splines or tensor product smoothers.
Recently, a highly flexible Bayesian APC (BAPC) model based on the INLA approach has been proposed for future cancer incidence projections which assumes a Poisson distribution of incidence counts [19]. Havulinna et al. demonstrate that interactions between effects can be modeled by specifying appropriate priors [18].
Given the lack of guidance on statistical modeling approaches to cancer incidence projection and the increasing understanding across sciences that neutral comparisons of statistical methods are needed [23,24,25], we developed an R package which allows an integrated comparison of model performance metrics in the above described context. We thereby aim to facilitate an informed choice of statistical models and the development of methodological guidance. Due to the desirable flexibility in modeling options and the probabilistic interpretation of results in a Bayesian framework as well as the computationally efficient implementation, we emphasize the INLA approach. To demonstrate the functionality of the new package we provide an extensive empirical benchmarking analysis of a selection of potentially applicable modeling approaches using observed longterm data from three populationbased cancer registries.
Methods
Cancer registry data
Three low incident tumor sites/entities (brain tumors, kidney cancer, melanoma) and four high incident entities (lung, breast, colorectal, prostate) were selected from three populationbased cancer registries: SEER9 [26], NORDCAN [27] and Saarland [28]. Incidence data of patients below the age of 20 yrs. and older than 84 years (available only as aggregated data) were excluded from analysis. Specific selection criteria are shown in Table 1. Data was separately analyzed for males and females with few exceptions (prostate cancer: only male; breast cancer: only female in the Saarland data, males and females in SEER9 and NORDCAN data). A representative data structure of incidence and population data, as also used in the incAnalysis package, is shown in Suppl.Tbl. 1. Cancer cases (incidence) for a given year are stored in rows (most recent year in bottom row) and each row is separated by age(−group) in columns in increasing order from left to right.
From the Surveillance, Epidemiology, and End Results (SEER) Program in the United States, SEER9 cancer incidence data (1973–2014) were accompanied by population data, available in 1 year age groups.
NORDCAN data (1960–2015) comprise cancer incidence data from Denmark, Finland, Iceland, Norway, Sweden, Faroe Islands and Greenland. The data were retrieved from the NORDCAN website on 20180801. Incidence data were available in 5 year age groups. Population matrices were calculated from the personyears at risk information.
Cancer incidence data from Saarland (1970–2014), a German federal state with a longestablished cancer registry, were obtained from the Saarland cancer registry website on 20180801 (5 yr age groups). Population data were retrieved from the health report system of the federal government (up to 2012) und from the website of Saarland for the years 2013/14 [29, 30].
Projection models
The incAnalysis R package (see details below in section 3.2) was used to evaluate a number of increasingly complex models (GLMs, GAMs, BAPC) using the INLA framework. To describe the evaluated models, we introduce the following notation: Y denotes observed cancer incidence counts, N denotes population size, AGE and PERIOD are the respective covariates. The notation also corresponds to variable names used in the R package. Age or agegroup, respectively, are indexed by i. Selected projections are shown in Suppl.Figs. 1 and 2.
Generalized linear models (GLMs)
GLMs are formulated using three components: (1) a probability distribution from the exponential family, (2) a linear predictor η = Xβ and (3) a link function g with E(y) = μ = g^{−1}(η). In all, except BAPC models, negativebinomially distributed counts of tumor cases were assumed.
The most simplistically structured GLM includes only an intercept, η = β_{0}. In R, this interceptonly model was formulated as Y ~ offset (log(N)) (equivalent to: Y ~ 1 + offset (log(N)).
Next, a GLM with age and period as covariates together with their interaction term was assessed: η = β_{0} + β_{1}age + β_{2}period + β_{3}age : period, corresponding to the R formula Y ~ offset (log(N)) + AGE*PERIOD.
Generalized additive models (GAMs)
GAMs have a structure similar to GLMs, with the difference that smooth functions f s of covariates can be included in the linear predictor (A: model matrix, θ: parameter vector): g(μ) = A θ + f_{1}(x_{1}) + f_{2}(x_{2}) + ….
Splines might be used as smooth functions, or in the case of INLA, specific Gaussian Markov Random Fields. In the present analysis, Bsplines were used as univariate smoother for the age covariate and bs() from the splines package can directly be included in the model formula: Y ~ offset (log(N)) + PERIOD+bs (AGE). Alternatively, an random walk order 2 (rw2) model might be specified as Y ~ offset (log(N)) + PERIOD+f (AGE, model = ‘rw2’).
To allow evaluation of models with multivariate tensor product smoothers for age and period with INLA, we used an adhoc solution applying a zmodel (we acknowledge that this is a nonstandard appraoch and a more detailed outline than in the scope of this article would be useful before more widespread application). Tensor spline interactions can be specified, e.g. by using the function mgcv::te() for the classical model fitting approach (Y ~ offset (log(N)) + te (AGE,PERIOD)). In RINLA, te() is not directly usable in model formulas. The zmodel we used instead is an implementation of classical random effects part of a mixed model (η = … + Z z). The random effects design matrix is \( \boldsymbol{Z}=\left(\begin{array}{ccc}{\boldsymbol{Z}}_{\mathbf{1}}& \cdots & \mathbf{0}\\ {}\vdots & \ddots & \vdots \\ {}\mathbf{0}& \cdots & {\boldsymbol{Z}}_{\boldsymbol{i}}\end{array}\right) \) for each cluster i which has q ∈ ℕ^{+} random effects. Z was calculated as the tensor product smooth model matrix for marginal bases for age and period model matrices A and P using mgcv::tensor.prod.model.matrix() [31]. The ith row of the resulting tensor product model matrix is calculated as the Kronecker product of the ith rows of A and P. Marginal bases were calculated as Msplines, using splines2::mSpline() [32]. Msplines are nonnegative splines, which can be considered as a normalized version of Bsplines. A loggamma prior was specified for this model, with parameter values (a = 1, b = 0.005), the same values used as in [33]. The corresponding R code is shown in the package vignette vignette(‘incidence’).
Bayesian ageperiodcohort models (BAPC)
APC models estimate the individuals’ age, birth cohort and the period in which the event occurred [19]: η_{ij} = log(λ_{ij}) = μ + α_{i} + β_{j} + γ_{k} with intercept μ, and age, period and cohort effects α_{i}, β_{j} and γ_{k}. i (1 ≤ i ≤ I) denotes the age group at time point j (1 ≤ j ≤ j), the cohort index k depends on the age and period index as well as on the age group and period interval width: k = j + M (I − i ). M encodes the width of age groups as compared to period intervals, e.g. for 5 yr age groups and yearly data, M is 5. The model implemented in the BAPC package assumes Poisson distributed data, includes the three random effects age, period, cohort (secondorder random walk, rw2) and an additional random effect (independent and identically distributed, iid) to adjust for overdispersion. Separate age, period and cohort effects are not identifiable due to the exact linear dependence of effects [19].
Performance metrics
Model performance was evaluated using three metrics: coverage, bias and precision. Metrics were calculated per age/agegroup, sex and entity, and averaged (arithmetic mean), yielding one aggregated value per entity, gender, projection interval and projection models as a summary statistic.
Coverage was calculated as the fraction of projections lying within the 95% (equal tailed) credibility band. Bias was set to 0 if the observed incidence count was equal to the predicted, otherwise the ratio (observedpredicted)/observed was computed. Posterior standard deviations were used as a measure of precision.
Model performance
Evaluation of the predictive performance of models with increasing complexity was performed as follows (see also Fig. 1): the most current observed incidence data was predicted, with the projection period starting n years prior to this timepoint (n ∈ {2, 5, 10,15,20}). The observation period for model training preceded this timepoint. In the presented analysis, 15 yrs. were chosen as observation period. For the evaluation of a 2 yr projection, e.g. in the SEER9 dataset, data of the year 2014 would be predicted, using data from the 15 yrs. prior to 2012 for model fitting.
Das was available in different aggregation types  as agegroups for NORDCAN and Saarland data and for each age for the SEER9 data. In the latter case, individual ageyears were used, i.e. no further aggregation was applied.
R package incAnalysis
To facilitate further application and reproducibility, the R package ‘incAnalysis’ was developed. It is publicly available on http://github.com/mknoll/incAnalysis. The package mainly builds on methods in the R packages BAPC [19], mgcv [22] and RINLA [ref: http://www.rinla.org/]. Representative analyses with stepwise explanations on how to use the package are outlined in the accompanying vignette in more detail: vignette(‘incidence’)in R. An overview of the functionality and structure of the package is given in Fig. 2.
A wide variety of approaches to project future cancer incidence can be comparatively assessed using this package. Constant rates or counts simply projected into the future, as well as GLMs and GAMs (both in the INLA and ML/REML framework, selected via the method parameter) and BAPC models might be specified.
The package provides a class called incClass which is instantiated with population and incidence data (data.frame with years in rows, the earliest available year in the first row and age/agegroup as columns with increasing values from left to right) as well as the period used for model training and the fitting period of interest (and additional parameters). Different models are then added to the newly created object with the following functions which usually expect additional parameters, e.g. model formulas and the respective class object: runFwProj() for forward projection of constant rates or constant counts, runGLM() for generalized linear models (using INLA or an ML approach, specified by the method parameter), runGAM() for GAMs, runInla() for any INLA model and runBAPC() to run the BAPC model [19]. evaluate() calculates the performance metrics, which can be extracted as data.frame via metrics(); additionally, projections are plotted. pitHist() plots Probability Integral Transform (PIT) histograms for all INLA fitted models.
Results
Coverage
Coverages for the evaluated models are shown in Fig. 3 for an observation period of 15 yrs. and projection periods of 2, 5, 10, 15 and 20 yrs.
Importantly, most models yielded coverages below 95%, with smallest (< 25%) coverages for intercept only models and highest coverages (> 75%) for BAPC models, irrespective of the projection period. Variability of coverages of BAPC projections is smaller in the SEER9 dataset as compared to NORDCAN and Saarland data.
Coverage increased for AP models with linear age, period and interaction effect for longer projection intervals in all datasets. Models incorporating a univariate smoother for age showed no clear median increase in coverages for longer periods, variability, however, increased.
Multivariate smoother models showed a decrease of median coverages for longer projection intervals in the SEER9 data, in increase in the Saarland data and high variability with no clear trend in the NORDCAN data.
Bias
Results of bias analyses are shown in Fig. 4. Negative values correspond to higher predicted than observed incidence counts (overestimation). For visualization purposes, values <− 200 were set to − 200.
Several models show negative values. Absolute bias increases with longer projection intervals for most models in the SEER9 and Saarland datasets. Interceptonly models show mostly absolute median bias values below − 100, except for 15 and 20 yr projections in the Saarland data. Univariate smoother models show in most cases lower absolute bias as GLMs with linear age, period and interaction effects. Median absolute bias is smallest for the multivariate smoother models in SEER9 data for longer projection intervals. Differences in median absolute bias between all except interceptonly models are highest in the SEER9 dataset.
Precision
Precision is depicted in Fig. 5; median model values range mostly between 0.5 and 5 for the SEER9 data, 2 and 6 for the NORDCAN data and 0 and 4 in the Saarland dataset. Longer projection intervals yield lower precision for all but the intercept only model. Univariate smoother models show higher precision as compared to most additionally evaluated models. Variability in precision increases for longer projection intervals for the BAPC models, and for the SEER9 data, for univariate smoother GAMs. For the other models, no clear trend can be observed.
Discussion
Populationbased cancer registry data are routinely used to monitor cancer incidence at the populationlevel, to evaluate screening and prevention programs, and to identify areas where intensified medical research is needed [4]. However, no consensus appears to exist on which models to use for projections based on shortterm observational rate data in cancer epidemiology. Systematic empirical evaluations of potentially applicable approaches using existing cancer registry data for benchmarking appear sensible to obtain a better understanding of their operating characteristics and to ultimately make informed methodological recommendations. To facilitate this idea, we introduced an R package (incAnalysis) for an integrated evaluation of the adequacy of different statistical approaches in this context. We note that the package could in principle also be used for projections of other types of rates than incidence rates. In an extensive and systematic evaluation we demonstrated its use. While the presented results may already be informative for methodological guidance, we believe that further detailed and targeted applications would be helpful for the derivation of methodological guidance by expert panels. Consensus on desirable (or acceptable) operating characteristics would be sensible prerequisite for the appraisal of individual statistical modeling options.
In the reported empirical analysis only age(−groups) between 20 and 84 were analyzed, as childhood tumors constitute a biologically distinct group, are in general rare and require reliable projections of birth rates. This might impair the ability of models to obtain reliable projections; nevertheless it has been reported [34] that this approach might decrease accuracy. Cancers in the age group > = 85 were excluded to assure comparability between cancer registries (fixed agegroup width required for BAPC [gridFactor]).
Model performance was assessed by evaluating coverage, bias and precision of projections. Alternative metrics for model evaluation described are e.g. the Continuous Ranked Probability Score (CPRS) as used e.g. in [19] or the evaluation of PIT histograms. The latter can be easily obtained from INLA fitted objects, and further metrics as the CPRS can be easily calculated using the data provided by the incAnalysis package.
As the least complex model, intercept only models were evaluated. As expected, only small coverages (< 25%) could be expected as cancer occurrence is usually highly dependent on age. An intercept only model does not take the age into account (change in the distribution of age over calendar time), and thus, these models can hardly be recommended for cancer incidence projection, especially over a longer period.
GLMs with linear age, period and their interaction effect were evaluated as next, more complex model types. Performance, however, was generally poor. To achieve a potentially even better fit, a model with a univariate smoother for age was analyzed, as the latter is a biologically highly relevant covariate for cancer incidence. Bsplines, created with splines::bs() were incorporated into the model formula. An alternative would be the specification of a Gaussian Markov Random Field structure for smoothing, e.g. a second order random walk.
Next, multivariate smoothers (tensor product smoothers) for age and period were included into the model, using a zmodel in INLA. For classical ML/REML models, such effects can easily be included in the models by using the mgcv::te() function. The latter cannot be directly fit with INLA::inla(). Even though the mgcv::ginla() function was made available recently (which allows to obtain posterior distributions of effects directly from GAMs fitted with mgcv), the INLA package is not directly utilized by mgcv, and thus projections are not as straightforward as with the zmodel. Coverage is higher as compared to univariate smoother models, but is less stable for long term projections as compared to BAPC models.
Finally, the BAPC model was evaluated and turned out to be among the best performing for all evaluated parameter combinations. The additional two effects (cohort and overdispersion adjustment effect) seem to be especially important for shortterm projections, as differences to most other models except multivariate smoother models decrease for longer intervals.
Conclusions
The incAnalysis R package allows a straightforward comparison of key operating characteristics of statistical approaches to cancer incidence projection. Our empirical analyses of a selection of potentially applicable approaches suggest that (i) projections of rate data using short term data yields robust high coverage at the cost of low precision for BAPC, (ii) ageperiod GLMs with interaction term mostly yield better results for longer projection intervals (> 10 yrs), (iii) GAMs using tensor product smooth models (age, period) constitute a reasonable alternative to classical GLMs, and (iv) interceptonly models may at best be useful only for shortterm projections (< 5 yrs). Further detailed and targeted investigations into model performance seem advisable to make recommendations about appropriate statistical projection methods in a given setting.
Availability of data and materials
The datasets generated and/or analysed during the current study are included in the incAnalysis github package, https://github.com/mknoll/incAnalysis.
Abbreviations
 APC:

AgePeriodCohort
 BAPC:

Bayesian APC models
 CPRS:

Continuous Ranked Probability Score
 GAM:

Generalized Additive Model
 GLM:

Generalized Linear Model
 INLA:

Integrated Nested Laplace Approximations
 MCMC:

MarkovChain Monte Carlo
 ML:

Maximum Likelihood
 PIT:

Probability Integral Transform
 REML:

Restricted Maximum Likelihood
References
 1.
Brown LD, Cai TT, DasGupta A, Agresti A, Coull BA, Casella G, Corcoran C, Mehta C, Ghosh M, Santner TJ, et al. Interval estimation for a binomial proportion  comment  rejoinder. Stat Sci. 2001;16(2):101–33.
 2.
Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA Cancer J Clin. 2019;69(1):7–34.
 3.
Moller B, Fekjaer H, Hakulinen T, Sigvaldason H, Storm HH, Talback M, Haldorsen T. Prediction of cancer incidence in the Nordic countries: empirical comparison of different approaches. Stat Med. 2003;22(17):2751–66.
 4.
Bray F, Moller B. Predicting the future burden of cancer. Nat Rev Cancer. 2006;6(1):63–74.
 5.
Moller H, Fairley L, Coupland V, Okello C, Green M, Forman D, Moller B, Bray F. The future burden of cancer in England: incidence and numbers of new patients in 2020. Br J Cancer. 2007;96(9):1484–8.
 6.
Nowatzki J, Moller B, Demers A. Projection of future cancer incidence and new cancer cases in Manitoba, 20062025. Chronic Dis Can. 2011;31(2):71–8.
 7.
Dyba T, Hakulinen T, Paivarinta L. A simple nonlinear model in incidence prediction. Stat Med. 1997;16(20):2297–309.
 8.
Hakulinen T, Dyba T. Precision of incidence predictions based on Poisson distributed observations. Stat Med. 1994;13(15):1513–23.
 9.
Stock C, Mons U, Brenner H. Projection of cancer incidence rates and case numbers until 2030: A probabilistic approach applied to German cancer registry data (19992013). Cancer Epidemiol. 2018;(57):110–9.
 10.
Clements MS, Armstrong BK, Moolgavkar SH. Lung cancer rate predictions using generalized additive models. Biostatistics. 2005;6(4):576–89.
 11.
Engeland A, Haldorsen T, Tretli S, Hakulinen T, Horte LG, Luostarinen T, Schou G, Sigvaldason H, Storm HH, Tulinius H, et al. Prediction of cancer mortality in the Nordic countries up to the years 2000 and 2010, on the basis of relative survival analysis. A collaborative study of the five Nordic Cancer registries. APMIS Suppl. 1995;49:1–161.
 12.
Smith TR, Wakefield J. A review and comparison of ageperiodcohort models for Cancer incidence. Stat Sci. 2016;31(4):591–610.
 13.
Kupper LL, Janis JM, Salama IA, Yoshizawa CN, Greenberg BG. Ageperiodcohort analysis  an illustration of the problems in assessing interaction in one observation per cell Data. Commun StatTheor M. 1983;12(23):2779–807.
 14.
O’Brien RM. Constrained estimators and ageperiodcohort models. Sociol Methods Res. 2011;40(3):419–52.
 15.
Mistry M, Parkin DM, Ahmad AS, Sasieni P. Cancer incidence in the United Kingdom: projections to the year 2030. Br J Cancer. 2011;105(11):1795–803.
 16.
Moller B, Fekjaer H, Hakulinen T, Tryggvadottir L, Storm HH, Talback M, Haldorsen T. Prediction of cancer incidence in the Nordic countries up to the year 2020. Eur J Cancer Prev. 2002;11(Suppl 1):S1–96.
 17.
Whiteman DC, Green AC, Olsen CM. The growing burden of invasive melanoma: projections of incidence rates and numbers of new cases in six susceptible populations through 2031. J Invest Dermatol. 2016;136(6):1161–71.
 18.
Havulinna AS. Bayesian ageperiodcohort models with versatile interactions and longterm predictions: mortality and population in Finland 18782050. Stat Med. 2014;33(5):845–56.
 19.
Riebler A, Held L. Projecting the future burden of cancer: Bayesian ageperiodcohort analysis with integrated nested Laplace approximations. Biom J. 2017;59(3):531–49.
 20.
Rue H, Martino S, Chopin N. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. JSTOR. 2009;71(2):319–92.
 21.
Rue H, Riebler A, Sorbye SH, Illian JB, Simpson DP, Lindgren FK. Bayesian computing with INLA: a review. Annu Rev Stat Appl. 2017;4:395–421.
 22.
Wood SN. Generalized additive models: an introduction with R, second edition edn. Boca Raton: Chapman and Hall/CRC Texts in Statistical Science; 2017.
 23.
Boulesteix AL, Binder H, Abrahamowicz M, Sauerbrei W. Simulation panel of the SI: on the necessity and design of studies comparing statistical methods. Biom J. 2018;60(1):216–8.
 24.
Crüwell S, Stefan AM, Evans NJ. Robust standards in cognitive science. Computational Brain & Behavior. 2019;2(3):255–65.
 25.
Mangul S, Martin LS, Hill BL, Lam AK, Distler MG, Zelikovsky A, Eskin E, Flint J. Systematic benchmarking of omics computational tools. Nat Commun. 2019;10(1):1393.
 26.
Research Data (19732014), National Cancer Institute, DCCPS, Surveillance Research Program, based on the November 2016 submission. [https://seer.cancer.gov].
 27.
Engholm G, Ferlay J, Christensen N, Bray F, Gjerstorff M, Klint A, Kotlum J, Olafsdottir E, Pukkala E, Storm H. NORDCANa Nordic tool for cancer information, planning, quality control and research. Acta Oncol. 2010;49(5):725–36.
 28.
Krebsregister Saarland [http://www.krebsregister.saarland.de/].
 29.
Tabellen und Grafiken aus dem Bereich "Gebiet und Bevölkerung" [https://www.saarland.de/6772.htm].
 30.
Bevölkerung im Jahresdurchschnitt 1980–2012 (Grundlage Zensus BRD 1987, DDR 1990) [http://www.gbebund.de/gbe10/trecherche.prc_them_rech?tk=700&tk2=906&p_uid=gast&p_aid=66019368&p_sprache=D&cnt_ut=1&ut=906].
 31.
Wood SN. Lowrank scaleinvariant tensor product smooths for generalized additive mixed models. Biometrics. 2006;62(4):1025–36.
 32.
Ramsay JO. Monotone regression splines in action. Stat Sci. 1988;3(4):425–41.
 33.
Bauer C, Wakefield J, Rue H, Self S, Feng ZJ, Wang Y. Bayesian penalized spline models for the analysis of spatiotemporal count data. Stat Med. 2016;35(11):1848–65.
 34.
Baker A, Bray I. Bayesian projections: what are the effects of excluding data from younger age groups? Am J Epidemiol. 2005;162(8):798–805.
Acknowledgements
MK and JF are members of the MD/PhD program at Heidelberg University and are funded by Heidelberg Medical Faculty.
The authors would like to thank two anonymous reviewers and the handling editor for their helpful comments and suggestions on the initial submission. In the authors view, this helped to significantly improve the quality and the clarity of the manuscript.
Funding
National Center for Tumor Diseases Heidelberg (NCT PRO2015.21), German Research Foundation (DFG UNITE SFB1389s), German Cancer Research Center (iMed). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Open Access funding enabled and organized by Projekt DEAL.
Author information
Affiliations
Contributions
CS designed the study, MK designed and created the R package. MK and CD wrote the manuscript with input from JF, AA, JD and AK. All authors provided critical feedback and helped shape the research, analysis and manuscript. The author(s) read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Access to NORDAN and SEER data was provided upon a request at the respective offices, Saarland data was publicly available, no additional approval was required.
Consent for publication
Not applicable.
Competing interests
CS is now fulltime employee of Boehringer Ingelheim Pharma GmbH & Co. KG, Ingelheim, Germany. The company had no role in design, analysis or interpretation of the presented work.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Knoll, M., Furkel, J., Debus, J. et al. An R package for an integrated evaluation of statistical approaches to cancer incidence projection. BMC Med Res Methodol 20, 257 (2020). https://doi.org/10.1186/s12874020011335
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874020011335
Keywords
 Cancer epidemiology, ageperiodcohort model
 Bayesian model
 Cancer incidence projection
 INLA