Skip to main content

A new cure model that corrects for increased risk of non-cancer death: analysis of reliability and robustness, and application to real-life data

Abstract

Background

Non-cancer mortality in cancer patients may be higher than overall mortality in the general population due to a combination of factors, such as long-term adverse effects of treatments, and genetic, environmental or lifestyle-related factors. If so, conventional indicators may underestimate net survival and cure fraction. Our aim was to propose and evaluate a mixture cure survival model that takes into account the increased risk of non-cancer death for cancer patients.

Methods

We assessed the performance of a corrected mixture cure survival model derived from a conventional mixture cure model to estimate the cure fraction, the survival of uncured patients, and the increased risk of non-cancer death in two settings of net survival estimation, grouped life-table data and individual patients’ data. We measured the model’s performance in terms of bias, standard deviation of the estimates and coverage rate, using an extensive simulation study. This study included reliability assessments through violation of some of the model’s assumptions. We also applied the models to colon cancer data from the FRANCIM network.

Results

When the assumptions were satisfied, the corrected cure model provided unbiased estimates of parameters expressing the increased risk of non-cancer death, the cure fraction, and net survival in uncured patients. No major difference was found when the model was applied to individual or grouped data. The absolute bias was < 1% for all parameters, while coverage ranged from 89 to 97%. When some of the assumptions were violated, parameter estimates appeared more robust when obtained from grouped than from individual data. As expected, the uncorrected cure model performed poorly and underestimated net survival and cure fractions in the simulation study. When applied to colon cancer real-life data, cure fractions estimated using the proposed model were higher than those in the conventional model, e.g. 5% higher in males at age 60 (57% vs. 52%).

Conclusions

The present analysis supports the use of the corrected mixture cure model, with the inclusion of increased risk of non-cancer death for cancer patients to provide better estimates of indicators based on cancer survival. These are important to public health decision-making; they improve patients’ awareness and facilitate their return to normal life.

Peer Review reports

Background

There is growing awareness that cancer patients, as compared to age- and sex-matched individuals of the general population, may be at an increased risk of death from causes other than the diagnosed cancer, mainly from cardiovascular and respiratory diseases or other independent cancers [1,2,3,4]. This increased risk can be partly a consequence of cancer, such as long-term adverse effects of the treatments, and partly due to determinants of cancer, such as a genetic predisposition or environmental and lifestyle related factors. A higher long-term risk of death from other causes in cancer patients than in the general population has been estimated, from Surveillance, Epidemiology, and End Results Program (SEER) cancer registries data, for colorectal, breast and testicular cancer [5]. This increased risk potentially jeopardizes estimations of net survival after cancer for epidemiological and public health purposes.

Net survival is the hypothetical survival that would be measured if the disease under study was the only possible cause of death. It should be used to compare cancer survival in groups with different population mortality [6]. To estimate net survival, two main settings were defined: the cause-specific setting and the relative survival setting, the latter not needing cause of death information. In the relative survival setting, net survival can be estimated through the excess mortality approach, by removing from observed mortality the mortality from other causes, which corresponds to the death that would occur in the cohort in the absence of cancer. Net survival can also be estimated using the ratio approach, by dividing observed survival by the survival that would be observed in the absence of cancer.

Usually, in population-based studies, the mortality (or survival) expected in the absence of cancer is derived from overall mortality (or overall survival) in the general population with the same characteristics. This approximation relies on the commonly accepted assumption that the probability of death by other causes in population-based cohorts of cancer patients is similar to the probability of death by all causes in the general population [6,7,8,9]. This assumption might not be true if cancer patients present an increased risk of dying from other causes, related (e.g. adverse effects of treatments) or not (e.g. independent second cancer) with the studied cancer, compared to the general population .

Cure models are used in cancer epidemiology to estimate relative survival under the assumption that a percentage of subjects will not die from cancer (“cured patients”) [10, 11]. This percentage of subjects is materialized by an asymptotic plateau reached by the relative survival curve. In this context, a patients’ increased risk of dying from other causes also impacts mortality rates in those cured, thus challenging the assumption that their mortality rates should be the same as those in the general population. Therefore, popular cancer burden indicators such as relative survival, cure fraction, time-to-cure and survival of uncured patients, can be severely biased if cancer patients exhibit a substantial increased risk of non-cancer death that is not taken into account. Such risk may also affect survival comparisons if it differs among compared populations. Acknowledging this increased risk would have obvious consequences in providing to the patients, parents, clinicians and all the health care stakeholders the estimation of indicators expressing mortality for cancer progression or relapse. For example, it would lead to better targeting of health care programs and enable long-term cancer survivors to obtain credit and insurance more easily. Because the health status of cancer survivors is probably known to the patients themselves, to their physicians and to their insurers, the presence of a comorbid condition that would increase the risk of death is already known and would in any case influence access to loans etc. [12]. The common methods to estimate excess mortality include part of the risk due to comorbidity as well as the excess risk attributed to cancer; they consequently overestimate an individual’s cancer mortality risk.

In the net survival setting, methods have been developed to account for differences between the risk of non-cancer death in cancer patients and the risk of death in the general population, or for insufficiently stratified life tables [13,14,15,16]. In the cure modelling setting, a model based on a generalization of mixture cure models [17] has been developed and applied to real-life data (colorectal, breast and lung cancer patients from United States cancer registries) [18]. The reliability of the model and the robustness of its estimates had to be studied in detail before undertaking any extensive applications on real-life data. Once tested, such models could provide practical indications for public health and for the modification of clinical follow-up for long-term survivors and cured cancer patients. This validation task cannot be done using real-life data because cure is an unobserved condition that is treated as a latent variable in cure models. No gold standard is therefore available from real-life data for comparison with model-based cure estimates. In contrast, simulated data, with the generation of large numbers of virtual cohorts with known proportions of cure and precisely defined survival functions, can be useful to test the model’s performance under controlled conditions.

This simulation-based study explored the reliability of a new “corrected” cure model, i.e. including a correction factor to take into account the increased risk of non-cancer death. First, the performance of the model and its statistical properties were explored when all its assumptions were valid. We then analysed the model’s robustness by investigating its performance when some of the underlying assumptions were violated. We also applied the corrected model to real population-based data for colon cancer from the French cancer registries.

Methods

Cure models

The proposed corrected mixture cure model can be seen as an extension of the conventional mixture cure model with different assumptions. The latter is used as a reference to assess the performance of the former.

Conventional mixture cure models

Mixture cure models [19] assume that the cohort of cancer patients is divided into two subgroups: those cured, who will never die from the diagnosed cancer, and the uncured, who will eventually die from the progression or relapse of the disease.

Relative survival (RS) can be estimated for a group of patients (supposed for now homogeneous with respect to age and other possible covariates) by the ratio approach:

$$RS(t)=\frac{S^o(t)}{S^e(t)},$$

where So(t) is the survival observed in the patients’ group and Se(t) the survival expected for the same subjects in the absence of cancer and t the time since diagnosis.

Relative survival can also be estimated using the excess hazard approach, assuming that the observed mortality hazard hO(t) could be split into two forces of mortality attributable to cancer hc(t) and to other causes he(t). This can be written analytically:

$${h}_O(t)={h}_c(t)+{h}_e(t).$$

The relative survival context assumes the expected mortality hazards of patients to be equal to those observed in a general population group comparable for geographic area, calendar year, age and sex, and sometimes for other known characteristics. This implies at the individual level that:

$${h}_e(t)={h}^{\ast }(t)={h}^{\ast}\left( age+t, sex, year+t\right)$$

where age and year are the patient’s age at diagnosis and year of diagnosis.

The cumulative observed hazard can be written as:

$${H}_O(t)={\int}_0^t{h}_c(v) dv+{\int}_0^t{h}^{\ast }(v) dv,$$

And observed survival can be written as:

$${S}_O(t)=\exp \left[-{H}_O(t)\right]=\exp \left[-{\int}_0^t{h}_c(v) dv\right]\exp \left[-{\int}_0^t{h}^{\ast }(v) dv\right],$$

where \(\exp \left[-{\int}_0^t{h}_c(v) dv\right]\) corresponds to the relative survival function and \(\exp \left[-{\int}_0^t{h}^{\ast }(v) dv\right]\) corresponds to the survival function for the general population.

The conventional mixture survival model expresses relative survival as a mixture of two net survival functions attributed to uncured (Su(X, t)) and cured patients (Scured(X, t)), and can be expressed as:

$$RS(t)=\pi \left(\boldsymbol{Z}\right){S}_{cured}\left(\boldsymbol{X},t\right)+\left(1-\pi \left(\boldsymbol{Z}\right)\right){S}_u\left(\boldsymbol{X},t\right)$$

where Scured(X, t) = 1 and Su(X, t) can be specified by any parametric survival function. There are a wide range of distribution functions to choose from, and in the mixture model we specified Su(X, t) as a Weibull function. The Weibull distribution is flexible enough to enable a monotonic increasing or decreasing mortality rate for the uncured group. The parametrization is Su(X, t) = exp(−λtγ)exp(δX), where λ> 0 and γ > 0 are respectively the scale and shape parameters considered as constant and δ is the proportional effect of covariates X on the baseline survival of uncured patients. To ensure that π remains between 0 and 1, we also specified π(Z) with a logistic link function allowing a linear effect β of covariates Z on the cure fraction. Its analytical expression can be written as:

$$\pi \left(\boldsymbol{Z}\right)={\left[1+\exp \left(-{\beta}_0-\boldsymbol{Z}\boldsymbol{\beta } \right)\right]}^{-1}.$$

Other link functions can be used instead of the logistic, for example we used the identity link (π = Zβ) [19] for an ancillary analysis addressed in the discussion and presented in the Supplementary material Table 1.

The final expression of relative survival and excess hazard hc in a conventional mixture cure model can be expressed as:

$$RS(t)={\left[1+\exp \left(-{\beta}_0-\boldsymbol{Z}\boldsymbol{\beta } \right)\right]}^{-1}+\left\{1-{\left[1+\exp \left(-{\beta}_0-\boldsymbol{Z}\boldsymbol{\beta } \right)\right]}^{-1}\right\}\kern0.5em \exp {\left(-{\lambda t}^{\gamma}\right)}^{\exp \left(\boldsymbol{X}\boldsymbol{\delta } \right)}$$

And

$${h}_c(t)=\frac{\left\{1-{\left[1+\exp \left(-{\beta}_0-\boldsymbol{Z}\boldsymbol{\beta } \right)\right]}^{-1}\right\}{\gamma \lambda t}^{\gamma}\exp \left(\boldsymbol{X}\boldsymbol{\delta } \right)\exp {\left(-{\lambda t}^{\gamma}\right)}^{\exp \left(\boldsymbol{X}\boldsymbol{\delta } \right)}}{{\left[1+\exp \left(-{\beta}_0-\boldsymbol{Z}\boldsymbol{\beta } \right)\right]}^{-1}+\left\{1-{\left[1+\exp \left(-{\beta}_0-\boldsymbol{Z}\boldsymbol{\beta } \right)\right]}^{-1}\right\}\exp {\left(-{\lambda t}^{\gamma}\right)}^{\exp \left(\boldsymbol{X}\boldsymbol{\delta } \right)}}$$

where X is the vector of covariates acting on the survival of the uncured.

Corrected mixture cure model (Model(1))

Relaxing the comparability assumption usually considered in the excess hazard approach, we set patients’ expected hazard equal to that in the general population multiplied by a constant parameter \(\alpha\!:\ h_{e}(t) = ah^{*}(t)=ah^{*}(age+t,sex,year+t),\)  with α > 0 .

The cumulative observed hazard can be written as:

$${H}_O(t)={\int}_0^t{h}_c(v) dv+{\int}_0^t{\alpha h}^{\ast }(v) dv$$

The proposed excess hazard function is the same as that of the conventional model, but the estimated parameters are different due to the correction of the expected hazard, as in Philips et al. [17].

The observed survival can be written as:

$${S}_O(t)=\exp \left[-{H}_O(t)\right]=\exp \left[-{\int}_0^t{h}_c(v) dv\right]\exp {\left[-{\int}_0^t{h}^{\ast }(v) dv\right]}^{\alpha },$$

where \(\exp {\left[-{\int}_0^t{h}^{\ast }(v) dv\right]}^{\alpha }=\kern0.5em {S}^{\ast }{\left(\textrm{t}\right)}^{\alpha}\kern0.5em\) corresponds to survival in the general population corrected by the scale parameter.

The value of parameter α, which is defined on +, can be interpreted as a hazard ratio. α > 1 indicates that mortality due to other causes in the cohort under study is higher than that in the general population, α = 1 a null effect, as implicit in the conventional cure models and α < 1 a lower mortality. We assume α to express a fixed effect and to be independent of age.

The final expression of observed survival in the corrected mixture cure model can be expressed as:

$${S}_O(t)={\left[1+\exp \left(-{\beta}_0-\boldsymbol{Z}\boldsymbol{\beta } \right)\right]}^{-1}\ast {S}^{\ast }{\left(\textrm{t}\right)}^{\alpha }+\left\{1-{\left[1+\exp \left(-{\beta}_0-\boldsymbol{Z}\boldsymbol{\beta } \right)\right]}^{-1}\right\}\ast \exp {\left(-{\lambda t}^{\gamma}\right)}^{\exp \left(\boldsymbol{X}\boldsymbol{\delta } \right)}\ast {S}^{\ast }{\left(\textrm{t}\right)}^{\alpha }$$

The corrected model expressed here (from here on called Model(1)), with the constraint α ≡ 1 gives the conventional cure model [19] with the same parameterization of age effects and uncured net survival function.

Model estimation

Model parameters were estimated using the maximum likelihood method from both individual and grouped data.

As in De Angelis et al. (1999), the total log-likelihood using the individual data approach in the conventional model can be expressed as:

$$l\left(\boldsymbol{\beta}, \gamma, \lambda, \boldsymbol{\delta} \right)=\sum_{j=1}^N-{d}_i\ln \left(\frac{\left(1-\pi \left(\left.{\boldsymbol{Z}}_i\right|\boldsymbol{\beta} \right)\right){f}_u\left(\left.{\textbf{X}}_{i,}{\textrm{t}}_i\right|\boldsymbol{\delta} \right)}{\pi \left(\left.{\boldsymbol{Z}}_{\boldsymbol{i}}\right|\boldsymbol{\beta} \right)+\left(1-\pi \left(\left.{\boldsymbol{Z}}_{\boldsymbol{i}}\right|\boldsymbol{\beta} \right)\right){S}_u\left(\left.{\textbf{X}}_{i,}{\textrm{t}}_i\right|\boldsymbol{\delta} \right)}+{h}^{\ast}\left({\textrm{t}}_i\right)\right)+\ln \Big(\pi \left(\left.{\boldsymbol{Z}}_{\boldsymbol{i}}\right|\boldsymbol{\beta} \right)+\left(1-\pi \left(\left.{\boldsymbol{Z}}_{\boldsymbol{i}}\right|\boldsymbol{\beta} \right)\right){S}_u\left(\left.{\textbf{X}}_{\boldsymbol{i}},{\textrm{t}}_i\right|\boldsymbol{\delta} \right)\Big)+\ln \left({S}^{\ast}\left({\textrm{t}}_i\right)\right)$$

where β,γ, λ are the vectors of parameters to be estimated using the maximum likelihood method, ti, di, and hi* are, respectively, the time at death or censoring, the censoring index, and the population death hazard for the i-th individual observation among N individuals. X and Z are the covariates associated with survival of uncured patients and cure fraction, respectively. fU(Xi, ti|δ) and SU(Xi, ti|δ) are respectively the density and survival of uncured patients at time ti depending on the effect δ of covariates acting on the baseline density or survival and S*(ti) is the general population survival at time ti, which is a constant term and can be removed from the likelihood in the conventional model.

Using the same idea as in the conventional model, the total log-likelihood using the individual data in the proposed corrected model can be expressed as:

$$l\left(\boldsymbol{\beta}, \gamma, \lambda, \boldsymbol{\delta}, \alpha \right)=\sum_{i=1}^N-{d}_i\ln \left(\frac{\left(1-\pi \left(\left.{\boldsymbol{Z}}_{\boldsymbol{i}}\right|\boldsymbol{\beta} \right)\right){f}_u\left(\left.{\textbf{X}}_i,{\textrm{t}}_i\right|\boldsymbol{\delta} \right)}{\pi \left(\left.{\boldsymbol{Z}}_i\right|\beta \right)+\left(1-\pi \left(\left.{\boldsymbol{Z}}_{\boldsymbol{i}}\right|\boldsymbol{\beta} \right)\right){S}_u\left(\left.{\textrm{t}}_i\right|\boldsymbol{\delta} \right)}+\alpha {h}^{\ast}\left({\textrm{t}}_i\right)\right)+\ln \left(\pi \left(\left.{\boldsymbol{Z}}_{\boldsymbol{i}}\right|\boldsymbol{\beta} \right)+\left(1-\pi \left(\left.{\boldsymbol{Z}}_{\boldsymbol{i}}\right|\boldsymbol{\beta} \right)\right){S}_u\left(\left.{\textbf{X}}_{\boldsymbol{i}},{\textrm{t}}_i\right|\boldsymbol{\delta} \right)\right)+\ln \left({S}^{\ast }{\left({\textrm{t}}_i\right)}^{\alpha}\right)$$

Notice that in the estimation step the S* cannot be removed as α is a parameter to be estimated, whereas it can in the conventional model.

The estimation from group data was carried out by building relative survival tables, stratified by relevant predictor variables, from each generated sample and from survival data for the general population. In this study we used the commonly used Ederer II relative survival, but other estimators could be considered, particularly in between-population comparative analyses. The binomial log-likelihood for the j-th interval of the life table of k-th strata was derived similarly to the formula provided by Dickman [20]:

$${{\textrm{d}}_{\textrm{jk}}}^{\ast}\log \left[1-{S}_{0jk}\right]+\left({\textrm{l}}_{\textrm{jk}} {\textrm{-d}}_{\textrm{jk}}\right)\ast \log \left[{S}_{0jk}\right]$$

and ljk = (njk – 0.5wjk), the effective number of patients in each j and k combination,

where njk, djk, and wjk were respectively the number of those alive at the start, dead and censored during the interval for k-th strata, and SOjk is the j-th interval-specific observed survival for k-th strata.

The log-likelihood using the grouped data approach in the conventional model can be expressed as: djk*log[1- \(\left({RS}_{jk}\ast {S}_{jk}^{\ast}\right)\Big]\)  + ( ljk - djk)* log[\({RS}_{jk}\ast {S}_{jk}^{\ast }\)].

Where \({S}_{jk}^{\ast }\) and RSjk are the expected survival and patients’ relative survival estimates in each j and k combination.

Using the same idea as in the conventional model, the log-likelihood using the grouped data in the proposed model can be expressed as:

$${{\textrm{d}}_{\textrm{jk}}}^{\ast}\log \left[1-\left({RS}_{jk}+\upalpha \log \left({S}_{jk}^{\ast}\right)\right)\right]+\left({\textrm{l}}_{\textrm{jk-} }{\textrm{d}}_{\textrm{jk}}\right)\ast \log \left[{RS}_{jk}+\upalpha \log \left({S}_{jk}^{\ast}\right)\right]$$

Note that, differently from the standard cure models, survival in the population has to be taken into consideration in the maximization of the log likelihood, due to the presence of α.

Using STATA, command strs was used to provide Ederer II estimates for grouped data, and command ml with the lnf method to determine Maximum likelihood estimations, based on the numerical calculation of derivatives, was used to maximize each sample likelihood. The results provided for each sample were stored, summarized and presented as a synthesis.

Simulation

Virtual samples representing cohorts of patients were built by means of a pseudorandom number-generating algorithm. Each virtual case was independently represented by three variables: age at diagnosis (restricted in all analyses from 40 to 74 years), follow-up time (0–15 years), and censoring index (alive, dead). Age at diagnosis was randomly generated from a uniform distribution within age classes 40–57, 58–64, 65–69, and 70–74, with each class including 25% of all cases.

In the simulations, age at diagnosis was the only covariate associated with both cure fraction (vector Z) and survival of uncured patients (vector X) from now on reported as x. In model applications, a standardized age variable x = (age-60)/15 was used.

Follow-up time and the censoring index were generated as follows.

  • The probabilities of administrative censoring (PAC) and of loss to follow-up (PLF) were assigned, and the corresponding times to censoring TC1 and TC2 were sampled from uniform distribution U[0, 15] with probability PAC and PLF, or set at a maximum (15 years) with probability 1-PAC and 1-PLF, respectively. The final censoring time was TC = min(TC1, TC2).

  • A Weibull distribution with parameters λP (scale) and γP (shape) was fitted to a set of observed survival data derived from population lifetables. The patients’ expected survival time TED to causes of death other than cancer was randomly sampled by the inverse transformation method from the estimated Weibull distribution, rising the general population survival probabilities to the power α to simulate the relative risk of non-cancer death of cancer patients.

  • The time to cancer death TCD was randomly sampled from a mixture cure model assuming that the net survival of uncured patients followed a Weibull distribution with scale and shape parameters λC and γC considered constant and with δ, representing the age effect on uncured survival. Sampling u* from the uniform distribution U[0, 1] we set:

$${\textrm{u}}^{\ast }=\uppi \left(\textrm{x}\right)+\left[1-\uppi \left(\textrm{x}\right)\right]{\left[\exp \left(-{\uplambda}_{\textrm{C}}{{\textrm{T}}_{\textrm{CD}}}^{\upgamma_{\textrm{C}}}\right)\right]}^{\exp \left[-\updelta \textrm{x}\right]}$$

Where π(x) = 1/[1 + exp.(−β0 -β x)] and β0 = ln[π(60) /(1 - π(60))]. The reference age was 60 years, and π(60) and β were fixed according to scenario. If u* > π(x) then we obtained:

$${T}_{CD}={\lambda}_c{\left[-\ln \left(\frac{u^{\ast }-\pi (x)}{1-\pi (x)}\right)\right]}^{{}^{1}\!\left/ \!{}_{{\gamma}_c}\right.}$$

Otherwise, TCD was set to infinity.

  • The time to death from diagnosis was then defined as TD = min(TED, TCD), and the final time of follow-up by T = min(TC, TD).

  • Finally, the censoring index d was equal to 0 (alive) if TD > TC, or equal to 1 (dead) if TC ≥ TD.

The whole set of true parameter values considered in the simulation analysis were chosen to mimic the survival pattern of common cancers and the demographic characteristics of real patients’ groups and general populations. For notational simplicity, the symbols π60, λ and γ will be used in the following instead of π(60), λc, and γc. The distribution of ages at diagnosis was derived from that observed in (both sexes) incident cases collected by the French network of cancer registries (FRANCIM) [21] during the period 1995–2009 and gathered in the FRANCIM common database. This database also provided the probability of administrative censoring (50%) and of loss to follow-up (3%). The survival distribution parameters in the population, fixed as λP = 88; γP = 11, were derived from the French general population life-tables, also for both sexes combined, for the year 2002 [19]. The underlying α in the simulated samples were attributed the values 0.8; 1.0; 1.2; 1.5; 2.0, the first indicating some (perhaps unrealistic) protective effect, the second no effect, and the others an increased risk of non-cancer death in patients as compared with the general population. The underlying true values of parameters determining the proportion of cured patients (π60 and β) and survival of the uncured (λ, γ, and δ) were derived from preliminary model applications to real data [18, 21,22,23], and are shown in Table 1. They were grouped under two scenarios, mirroring the behaviour of lung and breast cancers. In the following, the two scenarios will be named after the corresponding cancer within quotes (“Breast” or “Lung”).

Table 1 Performance indicators of Model(1) using Maximum likelihood (ML) estimation by different values of α

Finally, one thousand samples were generated for each scenario, as defined by a specific set of simulation parameters (α, π60, λ, γ, β, δ). Depending on the specific objective, the number of cases generated for each sample varied from a minimum of 500 to a maximum of 20,000.

Performance indicators

To allow comparison with the results obtained by the conventional cure model some of the estimations were done with the constraint α = 1 and with unconstrained Model(1), thereby taking into account the increased risk of non-cancer death.

From the considered models, we obtained estimates of six parameters: α, π60, λ, γ, β, and δ. Intrinsically, the conventional cure model did not estimate α. We indicated as true values the values of parameters used to generate the samples, and considered them the gold standard to be compared with model-based estimates. The following performance indicators were calculated for each parameter from the set of 1000 samples generated under a specific scenario.

  • Absolute Bias (AB) = Mean(estimates - true value)

  • Standard Deviation (SD) = standard deviation over the set of 1000 estimates

  • Coverage (CVR) = the proportion of the time that the estimated 95% confidence intervals contained the true value

Robustness analysis

We investigated the estimates provided by Model(1) when some of the underlying assumptions went against the data. In particular, we analysed the model’s performance in three different situations: the times to cancer death of uncured patients do not follow a Weibull distribution; the increased risk of non-cancer death is dependent on age at diagnosis; the increased risk of non-cancer death varies randomly among patients. All of the robustness analyses were carried out by conducting 1000 independent runs with simulated samples of 10,000 cases each.

The times to cancer death of uncured patients do not follow a Weibull distribution

Model(1) assumes that the relative survival of uncured patients diagnosed at the reference age 60 (x = 0) follows the Weibull distribution with parameters λ and γ.

$${\textrm{S}}_{\textrm{u}}\left(\textrm{t},\textrm{x}\right)=\exp {\left(-{\uplambda \textrm{t}}^{\upgamma}\right)}^{\exp \left[-\updelta \left(\textrm{x}\right)\right]}$$

In this analysis, data were generated according to the corrected exponential Weibull distribution [24]:

$${\textrm{S}}_{\textrm{u}}\left(\textrm{t},\textrm{x}\right)={\left\{1-{\left[1-\exp \left(-{\uplambda \textrm{t}}^{\upgamma}\right)\right]}^{\uptheta}\right\}}^{\exp \left[-\updelta \left(\textrm{x}\right)\right]}$$

where the second shape parameter θ modulates the distance from the Weibull. When θ < 1, the hazard is U-shaped, i.e. first decreasing and then increasing. The opposite bell-shaped pattern is obtained when θ > 1. Note that at ages different from 60, x ≠ 0, the survival function (2) no longer follows an exponential Weibull distribution.

Model(1), with uncured survival function specified by (1) was then fitted to data generated from survival function (2). We tested the model under the two scenarios “Breast”, and “Lung“, with θ varying between 0.25 and 4. The shapes of the probability density functions generated under these values are plotted in Supplementary material Figure 1 together with the shape of the corresponding Weibull distributions. The range was considered sufficiently wide to include most of the real-world situations. Note that values of θ far from 1 led to drastic changes in time to death distribution. For instance, 5-year survival from the Weibull distribution (λ =0.4; γ =0.8; θ = 1) increases from 23 to 66% when θ = 4, and decreases to 8% when θ = 0.5. For this reason, actual estimates were compared with underlying values only for parameters α, π, β and δ.

The increased risk of non-cancer death is dependent on age at diagnosis

Model(1) assumes that the increased risk of non-cancer death, expressed by parameter α, is independent of age at diagnosis. In order to assess the robustness of parameter estimations (especially α) with regard to variations of α according to age, we fitted Model(1) to data generated in breach of this assumption. We generated times to non-cancer death from expected survival probabilities given by

$${\textrm{S}}_{\textrm{E}}={\textrm{S}}^{\ast {\upalpha}_{\textrm{x}}},\textrm{with}\ {\upalpha}_{\textrm{x}}=\upalpha +{\textrm{b}}_{\upalpha}\left(\textrm{x}\hbox{-} \textrm{E}\left(\textrm{x}\right)\right)$$

where S* was all-cause survival from the population life table, αx was the increased risk of non-cancer death as a function of age x at diagnosis; E(x) is the expected value of sampling age distribution (actually E(x) = 62.25 in all our samples), α was set at 1.2 and 2.0 (according to the considered scenario) and the slope coefficient bα was set to provide a reasonable range of values and vary between − 0.08 and 0.05 (i.e. 5% per year of age). Model(1) would of course estimate a single α parameter common to all ages. Given the linear relationship, we have for each sample E(αx) = α.

The increased risk of non-cancer death varies randomly among patients

Model(1) assumes that the increased risk of non-cancer death, expressed by parameter α, acts as a fixed effect. We investigated the behaviour of Model(1) when applied to data generated with an increased risk of non-cancer death randomly assigned to the simulated cases. Log(α) was considered uniformly distributed around the overall value, and with increasing ratios of maximum to minimum values from 2 to 4.

Results

Performances of models when all assumptions were valid

The performance indicators of Model(1) are reported in Table 1 for estimation methods for both grouped and individual data. Statistics on all six parameters considered are shown. Parameters β and δ represent changes in the probability of cure and of cancer survival in the uncured due to a 15-year difference in age at diagnosis. Estimates of α were always very close to the true underlying values, with an absolute bias (AB) ranging from − 0.006 to 0.006 and a relative bias always lower than 0.6%. The estimates of π60 were also very close to their underlying values. The AB was low for all of the other parameters, for both scenarios, and for both estimation methods. It ranged between − 0.011 (β estimate, “Breast” scenario) and + 0.004 (β estimate, “Lung” scenario). The standard deviation of α estimates was directly related to the true value of the parameter. Their coefficient of variation (standard deviation divided by the mean, not shown) ranged between 6 and 15%. The standard deviation of π60 estimates was considerably lower than those of α, with the coefficient of variation of estimates always within the range 4 to 6%, thereby resulting in higher precision of the estimates compared with α. The standard deviation of the estimates was also generally low for λ and γ , but it was higher for the two trend parameters β and δ. The standard deviation for estimates based on individual data was in general slightly lower than that for grouped data. The coverage estimated was almost always slightly lower but close to the nominal value of 95% for both scenarios, both methods, and all parameters. One partial exception was the coverage of λ estimates for “Lung”, which ranged from 89 to 95%.

Table 1 shows no strong advantage of estimates based on individual data with respect to those based on grouped data, which consumed far less time and computer power. The following Tables present only results from the latter method, but the robustness analysis for individual data can be found in Supplementary Material Table 2.

The main performance indicators of Model(1), compared with the conventional cure model, are presented in Table 2. The conventional model gave unbiased estimates for all parameters when α = 1, with precision and coverage similar to those obtained from Model(1), but progressively more biased estimates of the parameters as the underlying value of α departed from 1.0. This is one more reason in favour of the systematic use of the full Model(1).

Table 2 Performance indicators of conventional model and Model(1) using Maximum likelihood estimation on grouped data by different α

Table 3 illustrates the behaviour of Model(1) with varying sample size N and length of potential follow-up. For the “Breast” scenario, decreasing the sample size and length of follow-up led to positive bias for α and a negative one for π. For the “Lung” scenario, π was generally estimated well, but both positive and negative AB of α estimates were obtained for 5 yrs. follow-up. This was also due to the very large standard errors. Estimates of α were more sensitive to decreasing sample size and length of follow-up than were those of π. Generally speaking, these results indicate that small sample sizes and short follow-up definitely led to unstable estimates. With at least 10 years of follow-up N ≥ 5000 were needed to obtain good model estimates and with at least 15 years of follow-up the model provides acceptable estimates with smaller sample size (N ≥ 1000). The latter is needed to provide the amount of information ensuring good estimate of α as well as the other involved parameters.

Table 3 Performance indicators using grouped data according to sample size and length of follow-up in years

Obviously the ability of the Model(1) to produce acceptable results with similar sample size and length of follow-up varied also according to the amount of deaths, and these general conclusions can be relaxed. Indeed, a cohort of 500 cases will produce acceptable estimates when lethal cancer sites and long follow-up (15 years) are studied.

Within these conditions, the SD of α and π were roughly inversely proportional to the square root of sample size and increased with decreasing lengths of follow-up. The same conclusion was drawn when individual data were used (Supplementary Material Table 3 ).

Robustness analysis

All the previous results were obtained for samples generated according to the model assumptions. In the following paragraph, we report the performances of Model(1) when some of these assumptions were violated by the data generation algorithm.

The times to cancer death of uncured patients did not follow a Weibull distribution

The different shapes of the hazard and cumulative survival functions obtained by varying θ under the two considered scenarios are plotted in Supplementary Material Fig 1. The two dotted lines are those with minimum θ = 0.25, showing a high initial risk, followed by a minimum value and by an increasing risk, and those with the maximum θ = 4, showing an opposite pattern, increasing at the beginning, but with a decreasing trend in the long term. The reference function with θ = 1, corresponding to the Weibull distribution defined for each scenario, is represented by the black lines. Figure 1 in the supplementary material shows that values of θ not equal to 1 change both the pattern and the level of the hazard. As a consequence, the other survival parameters λ, γ, and δ, also have to change in order to fit the same dataset.

Performance indicators of estimates are summarized in Table 4 for all parameters apart from λ and γ, for which we did not have reference true values. Estimates of almost all parameters presented an increasing pattern for increasing values of θ except for α for “Breast” and δ. In order to balance the bias of β the AB of δ presented a decreasing pattern for increasing values of θ in both scenarios. The estimates of parameter α under the “Lung” scenario were the most sensitive to the value of θ, increasing from 1.89 to 2.81 (AB − 0.106 to 0.815) around the true value of 2.0. The estimates of the other parameters π60 , and β increased respectively from 10 to 13% and from − 0.75 to − 0.59.

Table 4 Performance indicators of Model(1) when applied to data generated with exponential-Weibull distribution of survival for uncured patients

As for the “Breast” scenario, the estimates of all three parameters were more stable and closer to their true values. The AB for α (true value = 1.2) ranged from − 0.024 to 0.01, those of π60 (true value = 70%) ranged from-0.058 to 0.082, while those of β (true value = − 0.15) ranged from − 0.069 to 0.018.

The increased risk of non-cancer death is dependent on age at diagnosis

In Table 5, we report the performance indicators of Model(1) when data were generated under a linear trend of α with age. The column headed bα shows the linear coefficient of the age trend and in the following column is presented the range: the minimum and maximum values of α at ages 40 and 75, respectively. The limit coefficient values were taken so as not to have implausible protective levels of α for the extreme age classes.

Table 5 Performance indicators of Model(1) in presence of variation of α according to age at diagnosis

The AB in estimating α was higher for “Breast”, with higher survival, than for the “Lung” scenario, for similar bα. It remained smaller than 0.1 and greater than − 0.1 when the rate of increase of α was between − 0.005 and 0.005 per year of age for “Breast” and between 0.01 and 0.02 for “Lung”. In both scenarios, the AB was roughly proportional to the slope bα. Thus, α estimates tended to their true values at older ages, for which the parameter has the higher impact due to the higher mortality for other causes. The other parameters were similarly biased by a breach of the α-age independency assumption for the “Breast” scenario and a negative slope, and were hardly affected in all other cases.

The increased risk of non-cancer death varied randomly among patients

Table 6 shows what happened when α was generated as a random effect. Mean estimates of α, assumed in Model(1) as a fixed effect covariate, slightly decreased with increasing random variability of the underlying parameter. The AB was non-negligible for α in both scenarios when the range of the max/min ratio became greater than 2 (− 0.11 for “Breast” and − 0.16 for “Lung” when the widest range of variability was considered). AB always remained closer to zero and by less than 0.05 for all other parameters. Estimates of β and δ were sensitive to the variability of α only for the “Breast” scenario, where showed higher AB but maintained a coverage close to 95%. In all the other cases, the estimates of π, λ, and γ parameters were only marginally affected by underlying α variability, and remained close to their true values.

Table 6 Performance indicators of Model(1) in presence of data generated with α randomly assigned among patients

Application to real-life FRANCIM colon cancer data

As an example, this method was applied to survival data of colon cancer patients recorded by FRANCIM (the French network of cancer registries). FRANCIM data are checked for quality and completeness every 4 years by an independent audit committee (Comité d’Évaluation des Registres). Life tables were provided by the National Statistics Institute (INSEE). All colon cancers diagnosed in 1995–2009 in patients aged 40–74 were included (N = 15,717 in men and N = 10,942 in women). The relative survival and the cure fraction were estimated, separately for each sex, using Model(1) and using the conventional model without the α parameter (Table 7). The cure assumption was already checked for these data [21].

Table 7 Real data application: (A) Parameters’ estimation (standard errors) and (B) net survival and proportion of cured

For both sexes, we estimated α  1.3, with confidence intervals not including 1, therefore supporting the hypothesis of increased non-cancer mortality in colon cancer patients. The differences between the cure fraction estimates from Model(1) and those from the conventional model were greater for males (57% vs. 52% at age 60) than for females (61% vs. 58%) due to the higher mortality rates for other causes in the male population. Model(1) and the conventional model also provided different estimates for survival by age. For example, 10-yr relative survival in males decreased from 63% at age 40 to 57% at age 70 with Model(1), and from 64 to 52%, respectively, with the conventional model. Since non-cancer mortality increases steeply with age, not including alpha in the model led to a portion of the age trend in mortality being attributed to cancer rather than to other causes. Not taking into consideration the cancer patients’ increased risk of non-cancer death led to bias in the colon cancer net survival estimators produced so far.

Discussion

This simulation study, which showed very good performances, validated the mixture cure model for relative survival estimation presented here. The model includes a coefficient that reflects the risk of dying from causes other than cancer in cancer patients as compared with the risk of all-cause death in the general population. In the rationale of this study this coefficient expressed an increased risk of non-cancer death for cancer patients (α > 1) but we also validated the model for α =0.8, as we could not exclude the possibility that, for some specific cancers, the risk of non-cancer death in the patients can be smaller than the risk of death in the general population.

The model was tested in the ideal situation, where it exactly mirrored the pattern of data to which it was applied. In these simulations, we considered two extreme scenarios in terms of cure fraction, survival time of uncured patients, and age trends of both quantities, reflecting the survival pattern of lung and breast cancer. We considered linear age effects in order to reduce the number of parameters to be presented and because other analyses with categorical age effects provided similar results. We applied the model to both individual data and grouped data, the latter in the form of life tables stratified by age at diagnosis. When Model(1) assumptions were verified, the two methods showed comparable performances and provided unbiased estimates of the parameters, and coverage close to its nominal value of 95%. Standard deviations of estimates of α and π60 over the 1000 simulation runs using individual data were only slightly lower than those using grouped data. When there was an increased risk of non-cancer death, the estimates generated by the conventional model carried a risk of being severely biased. When there was no increased risk, Model(1) generated the same estimates as the conventional model. A sample size of 5000 cases with more than 10 years of follow-up were sufficient to obtain reliable estimates for all parameters in most scenarios.

Good model specification is a requirement of every modelling application, but seldom can it be completely achieved or checked. In the second part of this work, we studied the robustness of model estimates when the underlying assumptions were not in line with the data, more specifically: when relative survival in uncured patients did not follow a Weibull distribution; when the increased risk of non-cancer death was dependent on age at diagnosis; when the increased risk of non-cancer death was not a fixed effect but randomly varied within the cohort of patients. The individual data approach provided (Supplementary material Table 2) severely biased estimates in some extreme scenarios, in particular for the low survival “Lung” scenario with exponential Weibull parameters θ = 0.25. This was perhaps due to the extremely high probability density generated by this distribution in very short times (say, 1–2 months) after diagnosis, making the contribution of observations with longer follow-up time negligible. The analysis of grouped data was also faster and is suitable to overcome problems of data access, problems due to personal data protection regulations, which research teams are increasingly facing. The grouped data approach was therefore chosen to obtain a more detailed presentation of results.

Each of the mis-specification situations considered could be accounted for by modifying the basic model, i.e. by considering other survival distributions, or including age dependence, or random effect covariates. This could be the subject of future developments on corrected cure models. Nevertheless, investigating the robustness of a parsimonious model can help in finding a compromise between a lack of fit and over-parametrization.

The strength of the corrected model studied here lies in the fact that an overall risk of death from other causes can be estimated without needing to know the causal factors and their distribution in the patient set and in the general population. Such information is seldom available in population-based studies. On the other hand, the model assumes the increased risk of death in patients to act as a multiplicative relative risk with respect to mortality in the general population. The multiplicative relative risk model has, however, shown its plausibility during its broad and long use in epidemiological research. Its appropriateness in the specific field of cancer survivorship can be ascertained through an extensive application to epidemiological data on various cancers and in different populations.

The model is based on the plausible assumption that patients’ cancer mortality is not indefinitely increasing after some years since diagnosis, differently from the increasing age trend of other causes mortality. This makes in principle possible to disentangle the two risks. The model can be fitted even if the excess death remains constant after a long time and in this context the functional form has to be found that best describes the analysed data. Actually, we found a good behaviour of the model also in scenarios assuming persisting long-term cancer mortality, as expressed by γ parameter = 1.1 (Breast scenarios).

In order to test the model behaviour in a critical situation, it was also applied to data generated with negligible or zero proportion of cure and, in addition, with persisting long-term cancer mortality. Actually, we found that the corrected cure model provided biased estimates for α and for the cure fraction (Supplementary material Table 1). Such bias was however attributable to the logistic transformation. Indeed, when applying a linear age effect on the cure fraction, all estimates were unbiased (Supplementary material Table 1). Finally, we compared the estimates provided by the full model and those from a model obtained by retaining the α parameter but with no cure assumption (Supplementary material Table 1). The latter provided biased estimates only when the true cure fraction was higher than 5%. It can be suggested that, when in the applications the full corrected model gives an estimated cure fraction of less than 10%, the no-cure model, with the inclusion of α parameter, should be also applied for comparison. Moreover the model with linear effect of age on the cure fraction can be considered in the case of non-negative estimate of the cure fraction.

This work has several limitations. We only tested the model with the Weibull distribution assumption for survival of the uncured patients. The Weibull is a very popular survival distribution generally providing a good fit of cancer data [25] but other choices are possible as the lognormal [26], the loglogistic [27], or the flexible regression spline-generated [28] distributions. We have no reason to believe that any of these distributions would give different results in the case of well-specified model, when exactly the same distribution is used to generate and to fit the data. However, their behaviour may change when some of the assumptions are not verified, so the robustness analysis could be affected by the specific function considered.

Reliable cause-of-death data from population-based sources can be used to test the distribution assumptions considered. Cause of death was not available in the data of our application. In the future this information will be standardized and included in the basic dataset of cancer registries.

Age at diagnosis was the only covariate included in the model. Other variables available from population-based data, as sex, period of diagnosis or, sometimes, stage and treatment, can be included without substantially modifying the model structure.

A linear effect of age, as considered in the model, can be accepted for many cancers, but can be unrealistic for others. Breast and prostatic cancers have for instance a higher death risk for the young and the old with respect to the middle age patients. We choose the linear link to simplify the proof of concept, mainly focused on the alpha parameter, without great loss of generality. Second or higher order polynomials can be used to account for different cancer mortality pattern such as U-shaped.

Future methodological development can address the testing of other scenarios, designed to mirror a wider variety of cancers, exploring different survival distributions for the uncured and separately modelling the parameters in the distribution function, that is, the scale (λ) and shape (γ) parameters in the case of the Weibull distribution.

Previous applications of the same model to real data [17, 18] or of survival models not focused on cure [13,14,15] have shown that an increased patients’ mortality risk due to other causes exists for several cancers. Reasons for different ‘other causes of death’ risk in patients with respect to the general population may differ. They range from causes not generated by the cancer such as smoking, genetic and other lifestyle risk factors common to cancer and other diseases, as well as to deprivation and other socio-economic factors, to factors indirectly caused by the diagnosed cancer such as side effects of cancer treatment [29, 30]. We cannot separate these two components without detailed treatment data from clinical databases. However, our intention was to capture cure of the cancer itself, implying no further risk of progression and death due to the diagnosed cancer and following the definition proposed by Haupt in 2007 “cure from the original cancer regardless of any potential for, or presence of, remaining disabilities or side effects of treatment” [31]. Following this definition we distinguish from patients that will die for progression or relapse of the diagnosed cancer and those who will die from other causes related (e.g. adverse effects of treatments) or not (e.g. other disease) with the cancer [32] .

This increased risk of death is greater in patients aged 65 or more. By applying our model to colon cancer data, we showed that not acknowledging an increased risk of death led to substantial bias in relative survival estimations and cure fractions. Even a moderately increased risk of 1.3 can make a greater difference than, for example, the one induced by different choices of the relative survival estimator, an issue that has raised considerable debate within the biostatistics community [33,34,35].

Concerning applications, the model provides a new indicator in cancer patients: the relative risk of death due to other causes. Estimating and accounting for variations in this indicator over time, in different populations, or according to stage or treatment could potentially improve the validity of survival comparisons. Moreover, we believe that providing correct estimates of cure fractions and of survival in uncured patients is important in that it could be used to improve the planning of health services for cancer survivors, particularly when a large increase in long-term cancer prevalence occurs or is expected. For instance, the estimated increased risk of dying of 1.3% in patients cured for colon cancer indicates that long-term clinical follow-up should focus more on preventing or treating the long-term effects of surgery and chemotherapies and addressing risk factors for cancer that are shared with other chronic diseases. To be more effective in preventing or reducing the increased risk, the same services that provide the cancer clinical follow-up have to take care of the conditions responsible for the increased risk estimated by this study, particularly in the elderly.

Correct estimates of time to cure, derived from the cure model parameters, also have practical implications in the legislation addressing the “Right to be Forgotten” of cancer patients https://ecpc.org/policy/the-right-to-be-forgotten-a-new-research-project. Such legislation has been introduced in France, Belgium, Luxembourg and the Netherlands, with efforts being made across Europe to establish similar approaches. Finally, a more accurate balance between the risk of death due to cancer and that due to other, often more effectively controlled, causes can improve patients’ awareness and quality of life.

Conclusion

The present analysis supports the use of the corrected mixture cure model including the increased risk of non-cancer death, to provide better estimates of indicators based on cancer survival, which are important to public health decision-making and should improve patients’ awareness of their health status and facilitate their return to normal life.

Availability of data and materials

Data availability statement

The datasets used and/or analysed during the current study are available from the corresponding author upon reasonable request.

The French colon cancer data that support the findings of this study are available from Francim but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with the permission of Francim.

Code availability

Details of the Stata code and R-Package curesurv used to fit the proposed corrected mixture cue model are available in the Github and Zenovo repository. The repository is called curesurvTools (https://github.com/LauraBotta/curesurvTools and https://zenodo.org/record/7567886).

Please cite this material as in reference [36].

The code including the unweighted least square estimation of Model(1) parameters, is available from the first author. The results for the latter method are close but a jackknife technique was required to obtain good coverage.

Abbreviations

SEER:

Surveillance, Epidemiology, and End Results Program

RS:

Relative survival

FRANCIM:

French network of cancer registries

INSEE:

French National Statistics Institute

AB:

Absolute Bias

SD:

Standard Deviation

CVR:

Coverage

PAC :

Probabilities of administrative censoring

PLF :

Probabilities of loss to follow-up

References

  1. Aziz NM. Cancer survivorship research: state of knowledge, challenges and opportunities. Acta Oncol. 2007;46:417–32.

    Article  PubMed  Google Scholar 

  2. Rugbjerg K, Mellenkiaern L, Boice JD, et al. Cardiovascular diseases in survivors of adolescent and young adult cancer: a Danish cohort study, 1943–2009. J Natl Cancer Inst. 2014;106:dju110.

    Article  PubMed  Google Scholar 

  3. Armenian SH, Xu L, Ky B, et al. Cardiovascular disease among survivors of adult-onset cancer: community based retrospective cohort study. J Clin Oncol. 2016;34:1122–30.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Hinchliffe SR, Rutheford MJ, Crowter MJ, Nelson CP, Lambert PC. Should relative survival be used with lung cancer data. Br J Cancer. 2012;106:11854–9.

    Article  Google Scholar 

  5. Capocaccia R, Gatta G, Dal Maso L. Life expectancy of colon, breast, and testicular cancer patients: an analysis of US-SEER population-based data. Ann Oncol. 2015;26:1263–8.

    Article  CAS  PubMed  Google Scholar 

  6. Pohar Perme M, Estève J, Rachet B. Analysing population-based cancer survival – settling the controversies. BMC Cancer. 2016;16:933.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Ederer F, Axtell LM, Cutler SJ. The relative survival rate: a statistical methodology. Natl Cancer Inst Monogr. 1961;6:101–21.

    CAS  PubMed  Google Scholar 

  8. Hakulinen T. On long-term relative survival rates. J Chronic Dis. 1977;30:431–43.

    Article  CAS  PubMed  Google Scholar 

  9. Esteve J, Benhamou E, Croasdal N, Raymond L. Relative survival and the estimation of net survival : elements for further discussion. Stat Med. 1990;9:529–38.

    Article  CAS  PubMed  Google Scholar 

  10. Verdecchia A, De Angelis R, Capocaccia R, Sant M, Micheli A, Gemma Gatta G, et al. The cure for colon cancer: results from the Eurocare study. Int J Cancer. 1998;77:322–9.

    Article  CAS  PubMed  Google Scholar 

  11. Yu XQ, De Angelis R, Andersson TML, Lambert PC, O'Connell DL, Dickman PW. Estimating the proportion cured of cancer: some practical advice for users. Cancer Epidemiol. 2013;37(6):836–42.

    Article  CAS  PubMed  Google Scholar 

  12. Dumas A, Allodji R, Fresneau B, et al. The right to be forgotten: a change in access to insurance and loans after childhood cancer? J Cancer Surviv. 2017;11(4):431–7.

    Article  PubMed  Google Scholar 

  13. Goungounga JA, Touraine C, Grafféo N, Giorgi R. CENSUR working survival group. Correcting for misclassification and selection effects in estimating net survival in clinical trials. BMC Med Res Methodol. 2019;19(1):104.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Touraine C, Graffeo N, Giorgi R. More accurate cancer-related excess mortality through correcting background mortality for extra variables. Stat Methods Med Res. 2020;29(1):122–36.

    Article  CAS  PubMed  Google Scholar 

  15. Rubio FJ, Rachet B, Giorgi R, Maringe C, Belot A. On models for the estimation of the excess mortality hazard in case of insufficiently stratified life tables. Biostatistics. 2021;22(1):51–67.

    Article  PubMed  Google Scholar 

  16. Mba RD, Goungounga JA, Graffeo N, Giorgi R. CENSUR working survival group. Correcting inaccurate background mortality in excess hazard models through breakpoints. BMC Med Res Methodol. 2020;20(1):268.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Phillips N, Coldman A, McBride M. Estimating cancer prevalence using mixture models for cancer survival. Stat Med. 2001;21:1257–70.

    Article  Google Scholar 

  18. Botta L, Gatta G, Trama A, Capocaccia R. Excess risk of dying of other causes of cured cancer patients. Tumori J. 2019;105(3):199–204.

    Article  Google Scholar 

  19. Lambert PC, Thompson JR, Weston CL, et al. Estimating and modelling the cure fraction in population-based cancer survival analysis. Biostatistics. 2007;8:576–94.

    Article  PubMed  Google Scholar 

  20. Dickman PW, Sloggett A, Hills M, Hakulinen T. Regression models for relative survival. Stat Med. 2004;23:51–64.

    Article  PubMed  Google Scholar 

  21. Romain G, Boussari O, Bossard N, Remontet L, Bouvier AM, Mounier M, et al. French network of Cancer registries (FRANCIM). Time-to-cure and cure proportion in solid cancers in France. A population based study. Cancer Epidemiol. 2019;60:93–101.

    Article  PubMed  Google Scholar 

  22. Institut National de la Statistique et des Études Économiques. https://www.ined.fr/fr/tout-savoir-population/chiffres/france/mortalite-cause-deces/table-mortalite/. Accessed 6 Mar 2023.

  23. Dal Maso L, Panato C, Tavilla A, et al. Cancer cure for 32 cancer types: results from the EUROCARE-5 study. Int J Epidemiol. 2020;49(5):1517–25.

    Article  PubMed  Google Scholar 

  24. Mudholkar GS, Srivastava DK, Kollia GD. A generalization of the Weibull distribution with application to the analysis of survival data. J Am Stat Assoc. 1996;91:1575–83.

    Article  Google Scholar 

  25. Cvancarova M, Aagnes B, Fosså SD, Lambert PC, Møller B, Bray F. Proportion cured models applied to 23 cancer sites in Norway. Int J Cancer. 2013;132(7):1700–10. https://doi.org/10.1002/ijc.27802 Epub 2012 Dec 14. PMID: 22927104.

    Article  CAS  PubMed  Google Scholar 

  26. Boag JW. Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J Roy Stat Soc. 1949;11:15–44.

    Google Scholar 

  27. Mariotto AB, Zou Z, Zhang F, Howlader N, Kurian AW, Etzioni R. Can we use survival data from cancer registries to learn about disease recurrence? The case of breast cancer. Cancer Epidemiol Biomarkers Prev. 2018;27(11):1332–41. https://doi.org/10.1158/1055-9965.EPI-17-1129 Epub 2018 Oct 18. PMID: 30337342; PMCID: PMC8343992.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Andersson TM, Dickman PW, Eloranta S, Lambert PC. Estimating and modelling cure in population-based cancer studies within the framework of flexible parametric survival models. BMC Med Res Methodol. 2011;11:96. https://doi.org/10.1186/1471-2288-11-96 PMID: 21696598; PMCID: PMC3145604.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Bright CJ, Brentnall AR, Wooldrage K, et al. Errors in determination of net survival: cause-specific and relative survival settings. Br J Cancer. 2020;122:1094–101.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Botta L, Gatta G, Capocaccia R, Stiller C, Cañete A, Dal Maso L, et al. Long-term survival and cure fraction estimates for childhood cancer in Europe (EUROCARE-6): results from a population-based study. Lancet Oncol. 2022;23(12):1525–36. https://doi.org/10.1016/S1470-2045(22)00637-4 Epub 2022 Nov 16. PMID: 36400102.

    Article  PubMed  Google Scholar 

  31. Haupt R, Spinetta JJ, Ban I, et al. Long term survivors of childhood cancer: cure and care. The Erice statement. Eur J Cancer. 2007;43(12):1778–80.

    Article  PubMed  Google Scholar 

  32. Ellis L, Coleman MP, Rachet B. The impact of life tables adjusted for smoking on the socio-economic difference in net survival for laryngeal and lung cancer. Br J Cancer. 2014;111(1):195–202. https://doi.org/10.1038/bjc.2014.217 Epub 2014 May 22. PMID: 24853177; PMCID: PMC4090723.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Roche L, Danieli C, Belot A, et al. Cancer net survival on registry data: use of the new unbiased Pohar-Perme estimator and magnitude of the bias with the classical methods. Int J Cancer. 2013;132:2359–69.

    Article  CAS  PubMed  Google Scholar 

  34. Dickman PW, Lambert PC, Coviello E, Rutherford MJ. Estimating net survival in population-based cancer studies. Int J Cancer. 2013;133:519–22.

    Article  CAS  PubMed  Google Scholar 

  35. Seppa K, Hakulinen T, Pokhrel A. Choosing the net survival method for cancer survival estimation. Eur J Cancer. 2015;51(9):1123–9. https://doi.org/10.1016/j.ejca.2013.09.019. Epub 2013 Oct 29.

  36. Botta L, Goungounga J, Capocaccia R, Romain G, Boussari O, Jooste V. LauraBotta/curesurvTools:v1.0(v1.0): Zenodo; 2023. https://doi.org/10.5281/zenodo.7567886.

    Book  Google Scholar 

Download references

Acknowledgments

We thank Luigino Dal Maso and the reviewers for providing helpful comments to the layout of the manuscript and Philip Bastable for the revision of the English.

Funding

The work was partially supported by the French Institut National du Cancer (INCa grant number 2018–178) and the European Union (Fonds Européen de Développement Regional: FEDER grant number BG0028239).

Author information

Authors and Affiliations

Authors

Contributions

LB, JG, VJ, OB and RC designed the study, contributed to methods and materials, analysis, interpretation of the findings. LB and RC drafted the article. JG, OB and VJ revised the manuscript. GG and MC contributed to the interpretation of the findings and revised the manuscript. GR, LB, RC, JG and OB carried out the programming for the simulation study. LB and RC carried out the programming for the ML estimation in Stata. JG programmed curesurv in R, used in the application for real-life data. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Laura Botta.

Ethics declarations

Ethics approval and consent to participate

The observational non-interventional survival study on raw FRANCIM data was approved Consultative Committee for the Processing of Health Research Data (CCTIRS), by the French Data Protection Authority (CNIL, authorization n° 04-1036) and in agreement with the French legislation, there was no requirement for written informed consent. Administrative permission to access data was granted by the FRANCIM committee in charge of data sharing and all registries approved the use of their data. All methods were carried out in accordance with relevant guidelines and regulations. FRANCIM data used in this study were anonymized before use.

Consent for publication

Not Applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Supplemetary file Fig. 1.

 Probability density according to different θ parameters representing different distribution of the uncured function.Two figures representing Lung and Breast scenario. Supplemetary file Table 1. Performance indicators using grouped data of Model(1) in situations when there was very small or no cure at all and a persisting long-term cancer mortality (g=1). The Model(1) was used with a logistic age effect on cured (as proposed all along the manuscript), a linear age effect on cured and without cure. 1,000 estimates each composed by 10,000 cases and 15 years of follow-up. Supplementary file Table 2. Robustness analysis using Maximum likelihood on individual data. a) The times to cancer death of uncured patients do not follow a Weibull distribution; b) The extra non-cancer death risk is dependent of age at diagnosis; c) The extra non-cancer death risk randomly varies among the patients. Supplementary file Table 3. Performance indicators for a and p using individual data according to sample size and length of follow-up. 1,000 estimates each composed by different sample size and follow-up length in years.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Botta, L., Goungounga, J., Capocaccia, R. et al. A new cure model that corrects for increased risk of non-cancer death: analysis of reliability and robustness, and application to real-life data. BMC Med Res Methodol 23, 70 (2023). https://doi.org/10.1186/s12874-023-01876-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12874-023-01876-x

Keywords