Estimating and modelling cure in population-based cancer studies within the framework of flexible parametric survival models
© Andersson et al; licensee BioMed Central Ltd. 2011
Received: 3 February 2011
Accepted: 22 June 2011
Published: 22 June 2011
When the mortality among a cancer patient group returns to the same level as in the general population, that is, the patients no longer experience excess mortality, the patients still alive are considered "statistically cured". Cure models can be used to estimate the cure proportion as well as the survival function of the "uncured". One limitation of parametric cure models is that the functional form of the survival of the "uncured" has to be specified. It can sometimes be hard to find a survival function flexible enough to fit the observed data, for example, when there is high excess hazard within a few months from diagnosis, which is common among older age groups. This has led to the exclusion of older age groups in population-based cancer studies using cure models.
Here we have extended the flexible parametric survival model to incorporate cure as a special case to estimate the cure proportion and the survival of the "uncured". Flexible parametric survival models use splines to model the underlying hazard function, and therefore no parametric distribution has to be specified.
We have compared the fit from standard cure models to our flexible cure model, using data on colon cancer patients in Finland. This new method gives similar results to a standard cure model, when it is reliable, and better fit when the standard cure model gives biased estimates.
Cure models within the framework of flexible parametric models enables cure modelling when standard models give biased estimates. These flexible cure models enable inclusion of older age groups and can give stage-specific estimates, which is not always possible from parametric cure models.
Patient survival, the time from diagnosis to death, is the most important single measure of cancer patient care (the diagnosis and treatment of cancer). Cancer patient survival is often measured using 5-year relative survival, the proportion of patients that would still be alive 5 years after diagnosis if the cancer (either directly or indirectly) was the only possible cause of death . As cancer patient survival has improved for many cancer types, and many patients are cured of their disease, another important question is what proportion of patients are cured of their cancer.
For most cancers the relative survival will reach a plateau some years after diagnosis, indicating that the mortality among the patients still alive is the same as expected in the general population. This point is called the cure point and the patients still alive are considered "statistically cured". De Angelis et al. , Verdecchia et al. , Yu et al.  and Lambert et al.  have proposed cure models for population-based cancer studies that can be used to estimate the proportion of cancer patients that are "statistically cured". The models also give an estimate of the survival of those "uncured". These measures are of interest to patients, clinicians and policy makers, and can give valuable insights into temporal trends in cancer patient survival. One limitation of parametric cure models is that the functional form of the survival of the "uncured" has to be specified. It can sometimes be difficult to fit survival functions flexible enough to capture high excess hazard within a few months from diagnosis, which is common among older age groups. This has led to the exclusion of older age groups in population-based cancer studies using cure models . In our experience the current models can also give biased estimates, or fail to converge, when the cure proportion is high (e.g. 80% and above). Yu et al.  have proposed the generalized gamma distribution, which make less distributional assumptions, but computational difficulties may arise. Lambert et al.  have proposed a finite mixture of Weibull distributions to add flexibility, but this adds to the complexity of deciding which model parameters are allowed to vary by covariates since there are 4 Weibull parameters to be modelled. Non-parametric or semi-parametric cure models have been suggested (e.g. [8–11]), but they do not use relative survival.
This paper shows how these problems could potentially be avoided by using flexible parametric survival models to estimate the cure proportion and the survival of the "uncured" in a population-based setting. Flexible parametric survival models were first introduced by Royston and Parmar [12, 13], and extended to relative survival by Nelson et al.  and Lambert and Royston . The models are fitted on the log cumulative excess hazard scale using restricted cubic splines for the baseline. By the use of splines these models can more easily capture the shape of the underlying distribution. We illustrate the method using data on patients diagnosed with colon cancer in Finland during 1953-2003, which has previously been used to study temporal trends in the cure proportion . We use and further develop the flexible parametric survival model. Our results are compared to the previously published results by Lambert et al. . Here we also include patients 80 and above at diagnosis, who were excluded in the paper by Lambert et al. , as well as analysing a subset of the cohort with localised cancer to evaluate how the method perform when the survival is high.
Both S*(t) and h*(t) are assumed known and are usually obtained from routine data sources (eg. national or regional life tables).
Parametric cure models
It assumes that a proportion, π, of the patients will be cured (do not experience excess mortality), while the remainder, 1 - π, are "uncured". S u (t) is the cancer-specific survival function for the "uncured", and is estimated by the model along with the cure proportion. A parametric distribution for S u (t) has to be chosen, and a Weibull distribution is often used [2, 3, 5, 6].
which enables estimation of both the cure proportion and the survival of the "uncured". When modelling, both the cure proportion and the parameters in S u (t) or F Z (t) can be allowed to vary by covariates.
Flexible parametric survival model
where D is the number of time-dependent covariate effects and s(x; γ i ) is the spline function for the i th time-dependent effect.
Flexible parametric cure models
we see that the constant parameters, γ00 and βare used to model the cure proportion and the time-dependent parameters are used to model the distribution function F Z (t). The constraint of a zero effect for the linear spline term has to be incorporated for each spline function, s(x; γ i ), that we model. All spline variables take the value 0 from the point of the last knot, which means that in equation (15), the constant parameter, γ00, is the log cumulative excess hazard at and beyond the last knot for the reference group, and can therefore be used to predict cure. It is usually preferred to orthogonalise, i.e. by Gram-Schmidt orthogonalisation, the spline variables. This results in them not being zero from the point of the last knot, and cure can then not be predicted by a direct transformation of the constant parameters. Therefore, we have chosen to center the orthogonalised spline variables around the value they take at the last knot, which enables direct predictions of cure from the constant parameters. All parameters are estimated using maximum likelihood estimation on individual level data . The survival of "uncured" can be predicted in the same way as for the non-mixture cure model, and the median survival time of "uncured" is predicted using a Newton-Raphson algorithm in a similar way as Lambert et al. .
We have adapted the Stata package for flexible parametric survival models , to incorporate backward calculation of the splines and the constraint to force a constant cumulative excess hazard after the last knot. There are also postestimation commands to predict the cure proportion and the survival of the "uncured".
Evaluating the method
To evaluate the model we used data from the Finnish Cancer Registry. The Finnish Cancer Registry started in 1953, and the completeness for solid tumors is over 99% . We studied all patients diagnosed with colon adenocarcinoma in Finland 1953-2003, with follow-up until 2004. Patients that emigrated were censored at the date of emigration, and everyone still alive was censored 10 years after diagnosis. Patients that were incidentally diagnosed at autopsy or were registered solely on death certificate information were excluded. The cohort consists of 34,664 patients. The same cohort, restricted to patients aged less than 80 years at diagnosis, is described elsewhere . In that study, temporal trends of the cure proportion and the median survival time of uncured were estimated for different age groups, and we have repeated that analysis with the flexible parametric cure model.
We graphically compared the estimated relative survival from the flexible parametric cure model with empirical life table estimates of relative survival using the Ederer II  method. For comparison with the life table estimates the data were divided into 5 age groups (less than 50 years, 50-59 years, 60-69 years, 70-79 years and 80 years and above) and 5 calendar periods (1953-1964,1965-1974, 1975-1984, 1985-1994, 1995-2003).
Results from the flexible parametric cure model were also compared to results from a non-mixture cure model. Lambert et al.  used a mixture cure model with a Weibull distribution to study temporal trends by age group in survival of colon cancer patients in Finland. In that study, calendar year was modeled continuously using splines and age was categorized in four categories (less than 50 years, 50-59 years, 60-69 years and 70-79 years). The two main effects of year and age as well as an interaction between age and the linear spline variable for year were included for all three model parameters (the cure proportion and the two Weibull parameters). In this paper we repeated the analysis using a non-mixture cure model because it is more comparable with the flexible parametric cure model, but the estimates from the mixture and non-mixture cure models are very similar. We also included the oldest age group (80 years and above) that was excluded by Lambert et al. .
Evaluating the sensitivity to knot placement
The flexible parametric survival model has been shown to be robust to the number and location of the knots [14, 15]. To evaluate the sensitivity to the location of the knots for the flexible parametric cure model we compared predicted survival from the new model using different knot positions with life table estimates of relative survival. This was done separately for all combinations of age group and calendar period described previously. We used 6 knots and first distributed them according to default settings recommended by Lambert and Royston , evenly distributed at centiles of the log of the observed death times (centile 0, 20, 40, 60, 80 and 100). Since most of the death times happen early in follow-up the default positions put a lot of knots in the beginning of follow-up, so we also assessed putting more knots towards the end of follow-up, first by distributing 5 knots evenly according to centiles and one extra at the 95th centile (knot at centiles 0, 25, 50, 75, 95 and 100), and by placing more knots towards the end (at centiles 0, 35, 65, 80, 95 and 100). We also investigated the possibility to put the last knot earlier than the last observed death time, at the 95th centile (knots at centiles 0, 35, 65, 75, 85 and 95). To be sure that the knots are placed more evenly according to actual follow-up time we also put the knots at follow-up years instead of centiles of log death times (at the first and last death time and follow-up years 3, 5, 7 and 8). Finally we put the last knot after the last observed follow-up time (the knots were located at centiles 0, 25, 50, 75, 95 of log death times and the last knot 12 years from diagnosis).
Comparison to life table and standard non-mixture model
All flexible parametric cure models described above were also compared to standard flexible parametric survival models, without the restriction of constant cumulative excess hazard after the last knot, using Akaike's information criterion (AIC) and the Bayesian information criterion (BIC). The difference between the models is then the restriction on the linear spline term to be zero. For most models the restricted 9 model gives a better fit, indicating that the final term in the unrestricted model is probably close to zero. In Figure 2 the AIC and BIC for all age groups in the calendar period 1985-1994 are shown. There is no formal goodness of fit test for cure models, since they rely on a good fit at the end of follow-up, and most of the data is at the beginning of follow-up. Since the flexible parametric cure model is a flexible parametric survival model with a restriction on one of the parameters, it can as described here, be compared to a standard flexible parametric survival model to test the assumption of cure. But we believe that this should not be used as a formal test, the assumption of cure and the fit of the model should be assessed graphically. To repeat and compare to the results from Lambert et al.  we also fitted a flexible parametric cure model and a non-mixture cure model. Calendar year was modeled continuously using splines and age was categorized in five categories as described previously. The two main effects of year and age as well as an interaction between age and the linear spline variable for year was included. All variables were included both as constant and time-varying effects, knots for the baseline log cumulative excess hazard were placed at centiles 0, 25, 50, 75, 95 and the last 12 years from diagnosis, and for the time-varying effects knots were placed at centiles 0, 25, 50, 75 and 100.
Estimates of cure
Aged 80 and above
(a) Flexible parametric cure model
(a) Non-mixture cure model
Estimates of median survival time of "uncured"
Aged 80 and above
(b) Flexible parametric cure model
(b) Non-mixture cure model
Proportional excess hazards model
Parameter estimates from flexible parametric survival models
splines calculated backwards
incorporating a cure proportion
including time-dependent effects
LRT comparing to the previous model
The third model in Table 3 is a flexible parametric cure model that includes time-dependent effects for both age group and calendar periods. We only present the parameters for the constant effects in Table 3. The model parameters are harder to interpret, since they are no longer log excess hazard ratios. In both model 2 and 3 the parameters are transformations of cure. It has previously been shown for non-mixture cure models where a Weibull distribution is used that modelling of both Weibull parameters can be crucial . Similarly we believe that time-dependent effects should usually be included in the flexible parametric cure model for most cancers.
An example of a high cure proportion
The cure proportion is an important and interesting measure of cancer patient survival. Many of the cure models used for population-based cancer survival today rely on finding a parametric distribution flexible enough to capture the shape of the survival function, which in some scenarios is difficult to do. We here present a flexible parametric cure model, which is an extension of the flexible parametric survival model. This new method gives similar results to the Weibull non-mixture cure model, when it is reliable, and better fit when the Weibull non-mixture cure model gives biased estimates. This is illustrated here for the oldest age group where the Weibull non-mixture model gives biased estimates, but the flexible parametric survival model fits the data well.
Since the flexible parametric cure model uses splines to model the underlying survival, it is important that the model is not overly sensitive to the location of the knots. We have investigated the sensitivity and the model seems to be fairly robust to the number and location of the knots, but some care needs be taken regarding the location of the last knot. The cure proportion is estimated from the cumulative excess hazard at the last knot, so it is important not to place the last knot too early, but preferably at the last observed death time or later. It is also good to make sure that the knots are distributed along the whole follow-up time, since the model needs to fit well at the end of the follow-up, even if most of the events are at the beginning.
The mixture and non-mixture cure models are sometimes used in situations when cure is not reached within the available follow-up time in the data. This can be done since the models estimate an asymptote for the relative survival function, but estimates of cure can be very sensitive to the parametric distribution chosen. We do not recommend extrapolation in this way when using the flexible parametric cure model since the point of cure has to be chosen. Even though the position of the last knot can be outside the data the cure point should be reached within the available follow-up time.
As with other cure models, the flexible parametric cure model will give an estimate of the cure proportion even when cure is not reasonable. It is therefore important to always compare results from cure models with standard methods for relative survival and to make sure that there seem to be a proportion of patients that are cured (see Figure 2). This is not a specific drawback for the flexible parametric cure model, but for cure models in general. In contrast to the mixture and non-mixture cure model, it is for the flexible parametric cure model possible to informally test the assumption of a cure proportion since it is a restricted standard flexible parametric survival model. But these tests should be interpreted with some caution, since the comparison is based on the fit over the whole time-scale and not just towards the end where the cure proportion is estimated.
We have presented the flexible parametric cure model within a relative survival setting, since that is the method of choice for population-based studies. However, the flexible parametric survival model and the flexible parametric cure model can also be used for non-relative survival data. For example when cause of death is known and reliable, or when the background mortality is very low which is the case for childhood cancer.
To enable application of the method we have updated the Stata command for flexible parametric survival models , and added an option that will fitflexible parametric cure models.
Cure models within the framework of flexible parametric models enables cure modelling when standard models are not flexible enough. These flexible cure models enable inclusion of older age groups and can give stage-specific estimates, which is not always possible from standard methods.
This work was partially funded by the Swedish Cancer Society (Cancerfonden). We thank the referees for their comments, which have greatly improved the paper.
- Dickman PW, Adami HO: Interpreting trends in cancer patient survival. Journal of Internal Medicine. 2006, 260: 103-17. 10.1111/j.1365-2796.2006.01677.x.View ArticlePubMed
- De Angelis R, Capocaccia R, Hakulinen T, Söderman B, Verdecchia A: Mixture Models for Cancer Survival Analysis: Application to Population-Based Data with Covariates. Statistics in Medicine. 1999, 18: 441-454. 10.1002/(SICI)1097-0258(19990228)18:4<441::AID-SIM23>3.0.CO;2-M.View ArticlePubMed
- Verdecchia A, De Angelis R, Capocaccia R, Sant M, Micheli A, Gatta G, Berrino F: The cure for colon cancer: results from the EUROCARE study. International Journal of Cancer. 1998, 77: 322-329. 10.1002/(SICI)1097-0215(19980729)77:3<322::AID-IJC2>3.0.CO;2-Q.View Article
- Yu B, Tiwari RC, Cronin KA, Feuer EJ: Cure fraction estimation from the mixture cure models for grouped survival data. Statistics in Medicine. 2004, 23 (11): 1733-1747. 10.1002/sim.1774.View ArticlePubMed
- Lambert PC, Thompson JR, Weston CL, Dickman PW: Estimating and modeling the cure fraction in population-based cancer survival analysis. Biostatistics. 2007, 8 (3): 576-594.View ArticlePubMed
- Lambert PC, Dickman PW, Osterlund P, Andersson T, Sankila R, Glimelius B: Temporal trends in the proportion cured for cancer of the colon and rectum: a population-based study using data from the Finnish Cancer Registry. International Journal of Cancer. 2007, 121 (9): 2052-2059. 10.1002/ijc.22948.View Article
- Lambert P, Dickman P, Weston C, Thompson J: Estimating the cure fraction in population-based cancer studies using finite mixture models. Journal of the Royal Statistical Society (Series C). 2010, 59: 35-55. 10.1111/j.1467-9876.2009.00677.x.View Article
- Tsodikov A: A Proportional Hazards Model Taking Account of Long-Term Survivors. Biometrics. 1998, 54: 1508-1516. 10.2307/2533675.View ArticlePubMed
- Peng Y, Dear KBG: A Nonparametric Mixture Model for Cure Rate Estimation. Biometrics. 2000, 56: 237-243. 10.1111/j.0006-341X.2000.00237.x.View ArticlePubMed
- Sy JP, Taylor JMG: Estimation in a Cox Proportional Hazards Cure Model. Biometrics. 2000, 56: 227-236. 10.1111/j.0006-341X.2000.00227.x.View ArticlePubMed
- Corbiere F, Commenges D, Taylor JMG, Joly P: A penalized likelihood approach for mixture cure models. Stat Med. 2009, 28: 510-524. 10.1002/sim.3481.View ArticlePubMed
- Royston P: Flexible parametric alternatives to the Cox model, and more. The Stata Journal. 2001, 1: 1-28.
- Royston P, Parmar MKB: Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine. 2002, 21 (15): 2175-2197. 10.1002/sim.1203.View ArticlePubMed
- Nelson CP, Lambert PC, Squire IB, Jones DR: Flexible parametric models for relative survival, with application in coronary heart disease. Statistics in Medicine. 2007, 26 (30): 5486-5498. 10.1002/sim.3064.View ArticlePubMed
- Lambert PC, Royston P: Further development of flexible parametric models for survival analysis. Stata Journal. 2009, 9 (2): 265-290. [http://ideas.repec.org/a/tsj/stataj/v9y2009i2p265-290.html]
- Begg CB, Schrag D: Attribution of deaths following cancer treatment. J Natl Cancer Inst. 2002, 94 (14): 1044-1045.View ArticlePubMed
- Teppo L, Pukkala E, Lehtonen M: Data Quality and Quality Control of a Population-Based Cancer Registry. Experience in Finland. Acta Oncologica. 1994, 33: 365-369. 10.3109/02841869409098430.View ArticlePubMed
- Ederer F, Heise H: Instructions to IBM 650 Programmers in Processing Survival Computations. Methodological note No. 10, End Results Evaluation Section, National Cancer Institute, Bethesda MD. 1959
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/11/96/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.