 Research
 Open Access
 Published:
A new cure model that corrects for increased risk of noncancer death: analysis of reliability and robustness, and application to reallife data
BMC Medical Research Methodology volume 23, Article number: 70 (2023)
Abstract
Background
Noncancer mortality in cancer patients may be higher than overall mortality in the general population due to a combination of factors, such as longterm adverse effects of treatments, and genetic, environmental or lifestylerelated factors. If so, conventional indicators may underestimate net survival and cure fraction. Our aim was to propose and evaluate a mixture cure survival model that takes into account the increased risk of noncancer death for cancer patients.
Methods
We assessed the performance of a corrected mixture cure survival model derived from a conventional mixture cure model to estimate the cure fraction, the survival of uncured patients, and the increased risk of noncancer death in two settings of net survival estimation, grouped lifetable data and individual patients’ data. We measured the model’s performance in terms of bias, standard deviation of the estimates and coverage rate, using an extensive simulation study. This study included reliability assessments through violation of some of the model’s assumptions. We also applied the models to colon cancer data from the FRANCIM network.
Results
When the assumptions were satisfied, the corrected cure model provided unbiased estimates of parameters expressing the increased risk of noncancer death, the cure fraction, and net survival in uncured patients. No major difference was found when the model was applied to individual or grouped data. The absolute bias was < 1% for all parameters, while coverage ranged from 89 to 97%. When some of the assumptions were violated, parameter estimates appeared more robust when obtained from grouped than from individual data. As expected, the uncorrected cure model performed poorly and underestimated net survival and cure fractions in the simulation study. When applied to colon cancer reallife data, cure fractions estimated using the proposed model were higher than those in the conventional model, e.g. 5% higher in males at age 60 (57% vs. 52%).
Conclusions
The present analysis supports the use of the corrected mixture cure model, with the inclusion of increased risk of noncancer death for cancer patients to provide better estimates of indicators based on cancer survival. These are important to public health decisionmaking; they improve patients’ awareness and facilitate their return to normal life.
Background
There is growing awareness that cancer patients, as compared to age and sexmatched individuals of the general population, may be at an increased risk of death from causes other than the diagnosed cancer, mainly from cardiovascular and respiratory diseases or other independent cancers [1,2,3,4]. This increased risk can be partly a consequence of cancer, such as longterm adverse effects of the treatments, and partly due to determinants of cancer, such as a genetic predisposition or environmental and lifestyle related factors. A higher longterm risk of death from other causes in cancer patients than in the general population has been estimated, from Surveillance, Epidemiology, and End Results Program (SEER) cancer registries data, for colorectal, breast and testicular cancer [5]. This increased risk potentially jeopardizes estimations of net survival after cancer for epidemiological and public health purposes.
Net survival is the hypothetical survival that would be measured if the disease under study was the only possible cause of death. It should be used to compare cancer survival in groups with different population mortality [6]. To estimate net survival, two main settings were defined: the causespecific setting and the relative survival setting, the latter not needing cause of death information. In the relative survival setting, net survival can be estimated through the excess mortality approach, by removing from observed mortality the mortality from other causes, which corresponds to the death that would occur in the cohort in the absence of cancer. Net survival can also be estimated using the ratio approach, by dividing observed survival by the survival that would be observed in the absence of cancer.
Usually, in populationbased studies, the mortality (or survival) expected in the absence of cancer is derived from overall mortality (or overall survival) in the general population with the same characteristics. This approximation relies on the commonly accepted assumption that the probability of death by other causes in populationbased cohorts of cancer patients is similar to the probability of death by all causes in the general population [6,7,8,9]. This assumption might not be true if cancer patients present an increased risk of dying from other causes, related (e.g. adverse effects of treatments) or not (e.g. independent second cancer) with the studied cancer, compared to the general population .
Cure models are used in cancer epidemiology to estimate relative survival under the assumption that a percentage of subjects will not die from cancer (“cured patients”) [10, 11]. This percentage of subjects is materialized by an asymptotic plateau reached by the relative survival curve. In this context, a patients’ increased risk of dying from other causes also impacts mortality rates in those cured, thus challenging the assumption that their mortality rates should be the same as those in the general population. Therefore, popular cancer burden indicators such as relative survival, cure fraction, timetocure and survival of uncured patients, can be severely biased if cancer patients exhibit a substantial increased risk of noncancer death that is not taken into account. Such risk may also affect survival comparisons if it differs among compared populations. Acknowledging this increased risk would have obvious consequences in providing to the patients, parents, clinicians and all the health care stakeholders the estimation of indicators expressing mortality for cancer progression or relapse. For example, it would lead to better targeting of health care programs and enable longterm cancer survivors to obtain credit and insurance more easily. Because the health status of cancer survivors is probably known to the patients themselves, to their physicians and to their insurers, the presence of a comorbid condition that would increase the risk of death is already known and would in any case influence access to loans etc. [12]. The common methods to estimate excess mortality include part of the risk due to comorbidity as well as the excess risk attributed to cancer; they consequently overestimate an individual’s cancer mortality risk.
In the net survival setting, methods have been developed to account for differences between the risk of noncancer death in cancer patients and the risk of death in the general population, or for insufficiently stratified life tables [13,14,15,16]. In the cure modelling setting, a model based on a generalization of mixture cure models [17] has been developed and applied to reallife data (colorectal, breast and lung cancer patients from United States cancer registries) [18]. The reliability of the model and the robustness of its estimates had to be studied in detail before undertaking any extensive applications on reallife data. Once tested, such models could provide practical indications for public health and for the modification of clinical followup for longterm survivors and cured cancer patients. This validation task cannot be done using reallife data because cure is an unobserved condition that is treated as a latent variable in cure models. No gold standard is therefore available from reallife data for comparison with modelbased cure estimates. In contrast, simulated data, with the generation of large numbers of virtual cohorts with known proportions of cure and precisely defined survival functions, can be useful to test the model’s performance under controlled conditions.
This simulationbased study explored the reliability of a new “corrected” cure model, i.e. including a correction factor to take into account the increased risk of noncancer death. First, the performance of the model and its statistical properties were explored when all its assumptions were valid. We then analysed the model’s robustness by investigating its performance when some of the underlying assumptions were violated. We also applied the corrected model to real populationbased data for colon cancer from the French cancer registries.
Methods
Cure models
The proposed corrected mixture cure model can be seen as an extension of the conventional mixture cure model with different assumptions. The latter is used as a reference to assess the performance of the former.
Conventional mixture cure models
Mixture cure models [19] assume that the cohort of cancer patients is divided into two subgroups: those cured, who will never die from the diagnosed cancer, and the uncured, who will eventually die from the progression or relapse of the disease.
Relative survival (RS) can be estimated for a group of patients (supposed for now homogeneous with respect to age and other possible covariates) by the ratio approach:
where S^{o}(t) is the survival observed in the patients’ group and S^{e}(t) the survival expected for the same subjects in the absence of cancer and t the time since diagnosis.
Relative survival can also be estimated using the excess hazard approach, assuming that the observed mortality hazard h_{O}(t) could be split into two forces of mortality attributable to cancer h_{c}(t) and to other causes h_{e}(t). This can be written analytically:
The relative survival context assumes the expected mortality hazards of patients to be equal to those observed in a general population group comparable for geographic area, calendar year, age and sex, and sometimes for other known characteristics. This implies at the individual level that:
where age and year are the patient’s age at diagnosis and year of diagnosis.
The cumulative observed hazard can be written as:
And observed survival can be written as:
where \(\exp \left[{\int}_0^t{h}_c(v) dv\right]\) corresponds to the relative survival function and \(\exp \left[{\int}_0^t{h}^{\ast }(v) dv\right]\) corresponds to the survival function for the general population.
The conventional mixture survival model expresses relative survival as a mixture of two net survival functions attributed to uncured (S_{u}(X, t)) and cured patients (S_{cured}(X, t)), and can be expressed as:
where S_{cured}(X, t) = 1 and S_{u}(X, t) can be specified by any parametric survival function. There are a wide range of distribution functions to choose from, and in the mixture model we specified S_{u}(X, t) as a Weibull function. The Weibull distribution is flexible enough to enable a monotonic increasing or decreasing mortality rate for the uncured group. The parametrization is S_{u}(X, t) = exp(−λt^{γ})^{exp(δX)}, where λ> 0 and γ > 0 are respectively the scale and shape parameters considered as constant and δ is the proportional effect of covariates X on the baseline survival of uncured patients. To ensure that π remains between 0 and 1, we also specified π(Z) with a logistic link function allowing a linear effect β of covariates Z on the cure fraction. Its analytical expression can be written as:
Other link functions can be used instead of the logistic, for example we used the identity link (π = Zβ) [19] for an ancillary analysis addressed in the discussion and presented in the Supplementary material Table 1.
The final expression of relative survival and excess hazard h_{c} in a conventional mixture cure model can be expressed as:
And
where X is the vector of covariates acting on the survival of the uncured.
Corrected mixture cure model (Model(1))
Relaxing the comparability assumption usually considered in the excess hazard approach, we set patients’ expected hazard equal to that in the general population multiplied by a constant parameter \(\alpha\!:\ h_{e}(t) = ah^{*}(t)=ah^{*}(age+t,sex,year+t),\) with α > 0 .
The cumulative observed hazard can be written as:
The proposed excess hazard function is the same as that of the conventional model, but the estimated parameters are different due to the correction of the expected hazard, as in Philips et al. [17].
The observed survival can be written as:
where \(\exp {\left[{\int}_0^t{h}^{\ast }(v) dv\right]}^{\alpha }=\kern0.5em {S}^{\ast }{\left(\textrm{t}\right)}^{\alpha}\kern0.5em\) corresponds to survival in the general population corrected by the scale parameter.
The value of parameter α, which is defined on ℝ^{+}, can be interpreted as a hazard ratio. α > 1 indicates that mortality due to other causes in the cohort under study is higher than that in the general population, α = 1 a null effect, as implicit in the conventional cure models and α < 1 a lower mortality. We assume α to express a fixed effect and to be independent of age.
The final expression of observed survival in the corrected mixture cure model can be expressed as:
The corrected model expressed here (from here on called Model(1)), with the constraint α ≡ 1 gives the conventional cure model [19] with the same parameterization of age effects and uncured net survival function.
Model estimation
Model parameters were estimated using the maximum likelihood method from both individual and grouped data.
As in De Angelis et al. (1999), the total loglikelihood using the individual data approach in the conventional model can be expressed as:
where β,γ, λ,δ are the vectors of parameters to be estimated using the maximum likelihood method, t_{i,} d_{i}, and h_{i}* are, respectively, the time at death or censoring, the censoring index, and the population death hazard for the ith individual observation among N individuals. X and Z are the covariates associated with survival of uncured patients and cure fraction, respectively. f_{U}(X_{i}, t_{i}δ) and S_{U}(X_{i}, t_{i}δ) are respectively the density and survival of uncured patients at time t_{i} depending on the effect δ of covariates acting on the baseline density or survival and S*(t_{i}) is the general population survival at time t_{i,} which is a constant term and can be removed from the likelihood in the conventional model.
Using the same idea as in the conventional model, the total loglikelihood using the individual data in the proposed corrected model can be expressed as:
Notice that in the estimation step the S* cannot be removed as α is a parameter to be estimated, whereas it can in the conventional model.
The estimation from group data was carried out by building relative survival tables, stratified by relevant predictor variables, from each generated sample and from survival data for the general population. In this study we used the commonly used Ederer II relative survival, but other estimators could be considered, particularly in betweenpopulation comparative analyses. The binomial loglikelihood for the jth interval of the life table of kth strata was derived similarly to the formula provided by Dickman [20]:
and l_{jk} = (n_{jk} – 0.5w_{jk}), the effective number of patients in each j and k combination,
where n_{jk}, d_{jk}, and w_{jk} were respectively the number of those alive at the start, dead and censored during the interval for kth strata, and S_{Ojk} is the jth intervalspecific observed survival for kth strata.
The loglikelihood using the grouped data approach in the conventional model can be expressed as: d_{jk}*log[1 \(\left({RS}_{jk}\ast {S}_{jk}^{\ast}\right)\Big]\) + ( l_{jk } d_{jk})* log[\({RS}_{jk}\ast {S}_{jk}^{\ast }\)].
Where \({S}_{jk}^{\ast }\) and RS_{jk} are the expected survival and patients’ relative survival estimates in each j and k combination.
Using the same idea as in the conventional model, the loglikelihood using the grouped data in the proposed model can be expressed as:
Note that, differently from the standard cure models, survival in the population has to be taken into consideration in the maximization of the log likelihood, due to the presence of α.
Using STATA, command strs was used to provide Ederer II estimates for grouped data, and command ml with the lnf method to determine Maximum likelihood estimations, based on the numerical calculation of derivatives, was used to maximize each sample likelihood. The results provided for each sample were stored, summarized and presented as a synthesis.
Simulation
Virtual samples representing cohorts of patients were built by means of a pseudorandom numbergenerating algorithm. Each virtual case was independently represented by three variables: age at diagnosis (restricted in all analyses from 40 to 74 years), followup time (0–15 years), and censoring index (alive, dead). Age at diagnosis was randomly generated from a uniform distribution within age classes 40–57, 58–64, 65–69, and 70–74, with each class including 25% of all cases.
In the simulations, age at diagnosis was the only covariate associated with both cure fraction (vector Z) and survival of uncured patients (vector X) from now on reported as x. In model applications, a standardized age variable x = (age60)/15 was used.
Followup time and the censoring index were generated as follows.

The probabilities of administrative censoring (P_{AC}) and of loss to followup (P_{LF}) were assigned, and the corresponding times to censoring T_{C1} and T_{C2} were sampled from uniform distribution U[0, 15] with probability P_{AC} and P_{LF}, or set at a maximum (15 years) with probability 1P_{AC} and 1P_{LF}, respectively. The final censoring time was T_{C} = min(T_{C1}, T_{C2}).

A Weibull distribution with parameters λ_{P} (scale) and γ_{P} (shape) was fitted to a set of observed survival data derived from population lifetables. The patients’ expected survival time T_{ED} to causes of death other than cancer was randomly sampled by the inverse transformation method from the estimated Weibull distribution, rising the general population survival probabilities to the power α to simulate the relative risk of noncancer death of cancer patients.

The time to cancer death T_{CD} was randomly sampled from a mixture cure model assuming that the net survival of uncured patients followed a Weibull distribution with scale and shape parameters λ_{C} and γ_{C} considered constant and with δ, representing the age effect on uncured survival. Sampling u* from the uniform distribution U[0, 1] we set:
Where π(x) = 1/[1 + exp.(−β_{0} β x)] and β_{0} = ln[π(60) /(1  π(60))]. The reference age was 60 years, and π(60) and β were fixed according to scenario. If u* > π(x) then we obtained:
Otherwise, T_{CD} was set to infinity.

The time to death from diagnosis was then defined as T_{D} = min(T_{ED}, T_{CD}), and the final time of followup by T = min(T_{C}, T_{D}).

Finally, the censoring index d was equal to 0 (alive) if T_{D} > T_{C}, or equal to 1 (dead) if T_{C} ≥ T_{D}.
The whole set of true parameter values considered in the simulation analysis were chosen to mimic the survival pattern of common cancers and the demographic characteristics of real patients’ groups and general populations. For notational simplicity, the symbols π_{60}, λ and γ will be used in the following instead of π(60), λ_{c}, and γ_{c}. The distribution of ages at diagnosis was derived from that observed in (both sexes) incident cases collected by the French network of cancer registries (FRANCIM) [21] during the period 1995–2009 and gathered in the FRANCIM common database. This database also provided the probability of administrative censoring (50%) and of loss to followup (3%). The survival distribution parameters in the population, fixed as λ_{P} = 88; γ_{P} = 11, were derived from the French general population lifetables, also for both sexes combined, for the year 2002 [19]. The underlying α in the simulated samples were attributed the values 0.8; 1.0; 1.2; 1.5; 2.0, the first indicating some (perhaps unrealistic) protective effect, the second no effect, and the others an increased risk of noncancer death in patients as compared with the general population. The underlying true values of parameters determining the proportion of cured patients (π_{60} and β) and survival of the uncured (λ, γ, and δ) were derived from preliminary model applications to real data [18, 21,22,23], and are shown in Table 1. They were grouped under two scenarios, mirroring the behaviour of lung and breast cancers. In the following, the two scenarios will be named after the corresponding cancer within quotes (“Breast” or “Lung”).
Finally, one thousand samples were generated for each scenario, as defined by a specific set of simulation parameters (α, π_{60}, λ, γ, β, δ). Depending on the specific objective, the number of cases generated for each sample varied from a minimum of 500 to a maximum of 20,000.
Performance indicators
To allow comparison with the results obtained by the conventional cure model some of the estimations were done with the constraint α = 1 and with unconstrained Model(1), thereby taking into account the increased risk of noncancer death.
From the considered models, we obtained estimates of six parameters: α, π_{60}, λ, γ, β, and δ. Intrinsically, the conventional cure model did not estimate α. We indicated as true values the values of parameters used to generate the samples, and considered them the gold standard to be compared with modelbased estimates. The following performance indicators were calculated for each parameter from the set of 1000 samples generated under a specific scenario.

Absolute Bias (AB) = Mean(estimates  true value)

Standard Deviation (SD) = standard deviation over the set of 1000 estimates

Coverage (CVR) = the proportion of the time that the estimated 95% confidence intervals contained the true value
Robustness analysis
We investigated the estimates provided by Model(1) when some of the underlying assumptions went against the data. In particular, we analysed the model’s performance in three different situations: the times to cancer death of uncured patients do not follow a Weibull distribution; the increased risk of noncancer death is dependent on age at diagnosis; the increased risk of noncancer death varies randomly among patients. All of the robustness analyses were carried out by conducting 1000 independent runs with simulated samples of 10,000 cases each.
The times to cancer death of uncured patients do not follow a Weibull distribution
Model(1) assumes that the relative survival of uncured patients diagnosed at the reference age 60 (x = 0) follows the Weibull distribution with parameters λ and γ.
In this analysis, data were generated according to the corrected exponential Weibull distribution [24]:
where the second shape parameter θ modulates the distance from the Weibull. When θ < 1, the hazard is Ushaped, i.e. first decreasing and then increasing. The opposite bellshaped pattern is obtained when θ > 1. Note that at ages different from 60, x ≠ 0, the survival function (2) no longer follows an exponential Weibull distribution.
Model(1), with uncured survival function specified by (1) was then fitted to data generated from survival function (2). We tested the model under the two scenarios “Breast”, and “Lung“, with θ varying between 0.25 and 4. The shapes of the probability density functions generated under these values are plotted in Supplementary material Figure 1 together with the shape of the corresponding Weibull distributions. The range was considered sufficiently wide to include most of the realworld situations. Note that values of θ far from 1 led to drastic changes in time to death distribution. For instance, 5year survival from the Weibull distribution (λ =0.4; γ =0.8; θ = 1) increases from 23 to 66% when θ = 4, and decreases to 8% when θ = 0.5. For this reason, actual estimates were compared with underlying values only for parameters α, π, β and δ.
The increased risk of noncancer death is dependent on age at diagnosis
Model(1) assumes that the increased risk of noncancer death, expressed by parameter α, is independent of age at diagnosis. In order to assess the robustness of parameter estimations (especially α) with regard to variations of α according to age, we fitted Model(1) to data generated in breach of this assumption. We generated times to noncancer death from expected survival probabilities given by
where S* was allcause survival from the population life table, α_{x} was the increased risk of noncancer death as a function of age x at diagnosis; E(x) is the expected value of sampling age distribution (actually E(x) = 62.25 in all our samples), α was set at 1.2 and 2.0 (according to the considered scenario) and the slope coefficient b_{α} was set to provide a reasonable range of values and vary between − 0.08 and 0.05 (i.e. 5% per year of age). Model(1) would of course estimate a single α parameter common to all ages. Given the linear relationship, we have for each sample E(α_{x}) = α.
The increased risk of noncancer death varies randomly among patients
Model(1) assumes that the increased risk of noncancer death, expressed by parameter α, acts as a fixed effect. We investigated the behaviour of Model(1) when applied to data generated with an increased risk of noncancer death randomly assigned to the simulated cases. Log(α) was considered uniformly distributed around the overall value, and with increasing ratios of maximum to minimum values from 2 to 4.
Results
Performances of models when all assumptions were valid
The performance indicators of Model(1) are reported in Table 1 for estimation methods for both grouped and individual data. Statistics on all six parameters considered are shown. Parameters β and δ represent changes in the probability of cure and of cancer survival in the uncured due to a 15year difference in age at diagnosis. Estimates of α were always very close to the true underlying values, with an absolute bias (AB) ranging from − 0.006 to 0.006 and a relative bias always lower than 0.6%. The estimates of π_{60} were also very close to their underlying values. The AB was low for all of the other parameters, for both scenarios, and for both estimation methods. It ranged between − 0.011 (β estimate, “Breast” scenario) and + 0.004 (β estimate, “Lung” scenario). The standard deviation of α estimates was directly related to the true value of the parameter. Their coefficient of variation (standard deviation divided by the mean, not shown) ranged between 6 and 15%. The standard deviation of π_{60} estimates was considerably lower than those of α, with the coefficient of variation of estimates always within the range 4 to 6%, thereby resulting in higher precision of the estimates compared with α. The standard deviation of the estimates was also generally low for λ and γ , but it was higher for the two trend parameters β and δ. The standard deviation for estimates based on individual data was in general slightly lower than that for grouped data. The coverage estimated was almost always slightly lower but close to the nominal value of 95% for both scenarios, both methods, and all parameters. One partial exception was the coverage of λ estimates for “Lung”, which ranged from 89 to 95%.
Table 1 shows no strong advantage of estimates based on individual data with respect to those based on grouped data, which consumed far less time and computer power. The following Tables present only results from the latter method, but the robustness analysis for individual data can be found in Supplementary Material Table 2.
The main performance indicators of Model(1), compared with the conventional cure model, are presented in Table 2. The conventional model gave unbiased estimates for all parameters when α = 1, with precision and coverage similar to those obtained from Model(1), but progressively more biased estimates of the parameters as the underlying value of α departed from 1.0. This is one more reason in favour of the systematic use of the full Model(1).
Table 3 illustrates the behaviour of Model(1) with varying sample size N and length of potential followup. For the “Breast” scenario, decreasing the sample size and length of followup led to positive bias for α and a negative one for π. For the “Lung” scenario, π was generally estimated well, but both positive and negative AB of α estimates were obtained for 5 yrs. followup. This was also due to the very large standard errors. Estimates of α were more sensitive to decreasing sample size and length of followup than were those of π. Generally speaking, these results indicate that small sample sizes and short followup definitely led to unstable estimates. With at least 10 years of followup N ≥ 5000 were needed to obtain good model estimates and with at least 15 years of followup the model provides acceptable estimates with smaller sample size (N ≥ 1000). The latter is needed to provide the amount of information ensuring good estimate of α as well as the other involved parameters.
Obviously the ability of the Model(1) to produce acceptable results with similar sample size and length of followup varied also according to the amount of deaths, and these general conclusions can be relaxed. Indeed, a cohort of 500 cases will produce acceptable estimates when lethal cancer sites and long followup (15 years) are studied.
Within these conditions, the SD of α and π were roughly inversely proportional to the square root of sample size and increased with decreasing lengths of followup. The same conclusion was drawn when individual data were used (Supplementary Material Table 3 ).
Robustness analysis
All the previous results were obtained for samples generated according to the model assumptions. In the following paragraph, we report the performances of Model(1) when some of these assumptions were violated by the data generation algorithm.
The times to cancer death of uncured patients did not follow a Weibull distribution
The different shapes of the hazard and cumulative survival functions obtained by varying θ under the two considered scenarios are plotted in Supplementary Material Fig 1. The two dotted lines are those with minimum θ = 0.25, showing a high initial risk, followed by a minimum value and by an increasing risk, and those with the maximum θ = 4, showing an opposite pattern, increasing at the beginning, but with a decreasing trend in the long term. The reference function with θ = 1, corresponding to the Weibull distribution defined for each scenario, is represented by the black lines. Figure 1 in the supplementary material shows that values of θ not equal to 1 change both the pattern and the level of the hazard. As a consequence, the other survival parameters λ, γ, and δ, also have to change in order to fit the same dataset.
Performance indicators of estimates are summarized in Table 4 for all parameters apart from λ and γ, for which we did not have reference true values. Estimates of almost all parameters presented an increasing pattern for increasing values of θ except for α for “Breast” and δ. In order to balance the bias of β the AB of δ presented a decreasing pattern for increasing values of θ in both scenarios. The estimates of parameter α under the “Lung” scenario were the most sensitive to the value of θ, increasing from 1.89 to 2.81 (AB − 0.106 to 0.815) around the true value of 2.0. The estimates of the other parameters π_{60} , and β increased respectively from 10 to 13% and from − 0.75 to − 0.59.
As for the “Breast” scenario, the estimates of all three parameters were more stable and closer to their true values. The AB for α (true value = 1.2) ranged from − 0.024 to 0.01, those of π_{60} (true value = 70%) ranged from0.058 to 0.082, while those of β (true value = − 0.15) ranged from − 0.069 to 0.018.
The increased risk of noncancer death is dependent on age at diagnosis
In Table 5, we report the performance indicators of Model(1) when data were generated under a linear trend of α with age. The column headed b_{α} shows the linear coefficient of the age trend and in the following column is presented the range: the minimum and maximum values of α at ages 40 and 75, respectively. The limit coefficient values were taken so as not to have implausible protective levels of α for the extreme age classes.
The AB in estimating α was higher for “Breast”, with higher survival, than for the “Lung” scenario, for similar b_{α}. It remained smaller than 0.1 and greater than − 0.1 when the rate of increase of α was between − 0.005 and 0.005 per year of age for “Breast” and between 0.01 and 0.02 for “Lung”. In both scenarios, the AB was roughly proportional to the slope b_{α}. Thus, α estimates tended to their true values at older ages, for which the parameter has the higher impact due to the higher mortality for other causes. The other parameters were similarly biased by a breach of the αage independency assumption for the “Breast” scenario and a negative slope, and were hardly affected in all other cases.
The increased risk of noncancer death varied randomly among patients
Table 6 shows what happened when α was generated as a random effect. Mean estimates of α, assumed in Model(1) as a fixed effect covariate, slightly decreased with increasing random variability of the underlying parameter. The AB was nonnegligible for α in both scenarios when the range of the max/min ratio became greater than 2 (− 0.11 for “Breast” and − 0.16 for “Lung” when the widest range of variability was considered). AB always remained closer to zero and by less than 0.05 for all other parameters. Estimates of β and δ were sensitive to the variability of α only for the “Breast” scenario, where showed higher AB but maintained a coverage close to 95%. In all the other cases, the estimates of π, λ, and γ parameters were only marginally affected by underlying α variability, and remained close to their true values.
Application to reallife FRANCIM colon cancer data
As an example, this method was applied to survival data of colon cancer patients recorded by FRANCIM (the French network of cancer registries). FRANCIM data are checked for quality and completeness every 4 years by an independent audit committee (Comité d’Évaluation des Registres). Life tables were provided by the National Statistics Institute (INSEE). All colon cancers diagnosed in 1995–2009 in patients aged 40–74 were included (N = 15,717 in men and N = 10,942 in women). The relative survival and the cure fraction were estimated, separately for each sex, using Model(1) and using the conventional model without the α parameter (Table 7). The cure assumption was already checked for these data [21].
For both sexes, we estimated α ≅ 1.3, with confidence intervals not including 1, therefore supporting the hypothesis of increased noncancer mortality in colon cancer patients. The differences between the cure fraction estimates from Model(1) and those from the conventional model were greater for males (57% vs. 52% at age 60) than for females (61% vs. 58%) due to the higher mortality rates for other causes in the male population. Model(1) and the conventional model also provided different estimates for survival by age. For example, 10yr relative survival in males decreased from 63% at age 40 to 57% at age 70 with Model(1), and from 64 to 52%, respectively, with the conventional model. Since noncancer mortality increases steeply with age, not including alpha in the model led to a portion of the age trend in mortality being attributed to cancer rather than to other causes. Not taking into consideration the cancer patients’ increased risk of noncancer death led to bias in the colon cancer net survival estimators produced so far.
Discussion
This simulation study, which showed very good performances, validated the mixture cure model for relative survival estimation presented here. The model includes a coefficient that reflects the risk of dying from causes other than cancer in cancer patients as compared with the risk of allcause death in the general population. In the rationale of this study this coefficient expressed an increased risk of noncancer death for cancer patients (α > 1) but we also validated the model for α =0.8, as we could not exclude the possibility that, for some specific cancers, the risk of noncancer death in the patients can be smaller than the risk of death in the general population.
The model was tested in the ideal situation, where it exactly mirrored the pattern of data to which it was applied. In these simulations, we considered two extreme scenarios in terms of cure fraction, survival time of uncured patients, and age trends of both quantities, reflecting the survival pattern of lung and breast cancer. We considered linear age effects in order to reduce the number of parameters to be presented and because other analyses with categorical age effects provided similar results. We applied the model to both individual data and grouped data, the latter in the form of life tables stratified by age at diagnosis. When Model(1) assumptions were verified, the two methods showed comparable performances and provided unbiased estimates of the parameters, and coverage close to its nominal value of 95%. Standard deviations of estimates of α and π_{60} over the 1000 simulation runs using individual data were only slightly lower than those using grouped data. When there was an increased risk of noncancer death, the estimates generated by the conventional model carried a risk of being severely biased. When there was no increased risk, Model(1) generated the same estimates as the conventional model. A sample size of 5000 cases with more than 10 years of followup were sufficient to obtain reliable estimates for all parameters in most scenarios.
Good model specification is a requirement of every modelling application, but seldom can it be completely achieved or checked. In the second part of this work, we studied the robustness of model estimates when the underlying assumptions were not in line with the data, more specifically: when relative survival in uncured patients did not follow a Weibull distribution; when the increased risk of noncancer death was dependent on age at diagnosis; when the increased risk of noncancer death was not a fixed effect but randomly varied within the cohort of patients. The individual data approach provided (Supplementary material Table 2) severely biased estimates in some extreme scenarios, in particular for the low survival “Lung” scenario with exponential Weibull parameters θ = 0.25. This was perhaps due to the extremely high probability density generated by this distribution in very short times (say, 1–2 months) after diagnosis, making the contribution of observations with longer followup time negligible. The analysis of grouped data was also faster and is suitable to overcome problems of data access, problems due to personal data protection regulations, which research teams are increasingly facing. The grouped data approach was therefore chosen to obtain a more detailed presentation of results.
Each of the misspecification situations considered could be accounted for by modifying the basic model, i.e. by considering other survival distributions, or including age dependence, or random effect covariates. This could be the subject of future developments on corrected cure models. Nevertheless, investigating the robustness of a parsimonious model can help in finding a compromise between a lack of fit and overparametrization.
The strength of the corrected model studied here lies in the fact that an overall risk of death from other causes can be estimated without needing to know the causal factors and their distribution in the patient set and in the general population. Such information is seldom available in populationbased studies. On the other hand, the model assumes the increased risk of death in patients to act as a multiplicative relative risk with respect to mortality in the general population. The multiplicative relative risk model has, however, shown its plausibility during its broad and long use in epidemiological research. Its appropriateness in the specific field of cancer survivorship can be ascertained through an extensive application to epidemiological data on various cancers and in different populations.
The model is based on the plausible assumption that patients’ cancer mortality is not indefinitely increasing after some years since diagnosis, differently from the increasing age trend of other causes mortality. This makes in principle possible to disentangle the two risks. The model can be fitted even if the excess death remains constant after a long time and in this context the functional form has to be found that best describes the analysed data. Actually, we found a good behaviour of the model also in scenarios assuming persisting longterm cancer mortality, as expressed by γ parameter = 1.1 (Breast scenarios).
In order to test the model behaviour in a critical situation, it was also applied to data generated with negligible or zero proportion of cure and, in addition, with persisting longterm cancer mortality. Actually, we found that the corrected cure model provided biased estimates for α and for the cure fraction (Supplementary material Table 1). Such bias was however attributable to the logistic transformation. Indeed, when applying a linear age effect on the cure fraction, all estimates were unbiased (Supplementary material Table 1). Finally, we compared the estimates provided by the full model and those from a model obtained by retaining the α parameter but with no cure assumption (Supplementary material Table 1). The latter provided biased estimates only when the true cure fraction was higher than 5%. It can be suggested that, when in the applications the full corrected model gives an estimated cure fraction of less than 10%, the nocure model, with the inclusion of α parameter, should be also applied for comparison. Moreover the model with linear effect of age on the cure fraction can be considered in the case of nonnegative estimate of the cure fraction.
This work has several limitations. We only tested the model with the Weibull distribution assumption for survival of the uncured patients. The Weibull is a very popular survival distribution generally providing a good fit of cancer data [25] but other choices are possible as the lognormal [26], the loglogistic [27], or the flexible regression splinegenerated [28] distributions. We have no reason to believe that any of these distributions would give different results in the case of wellspecified model, when exactly the same distribution is used to generate and to fit the data. However, their behaviour may change when some of the assumptions are not verified, so the robustness analysis could be affected by the specific function considered.
Reliable causeofdeath data from populationbased sources can be used to test the distribution assumptions considered. Cause of death was not available in the data of our application. In the future this information will be standardized and included in the basic dataset of cancer registries.
Age at diagnosis was the only covariate included in the model. Other variables available from populationbased data, as sex, period of diagnosis or, sometimes, stage and treatment, can be included without substantially modifying the model structure.
A linear effect of age, as considered in the model, can be accepted for many cancers, but can be unrealistic for others. Breast and prostatic cancers have for instance a higher death risk for the young and the old with respect to the middle age patients. We choose the linear link to simplify the proof of concept, mainly focused on the alpha parameter, without great loss of generality. Second or higher order polynomials can be used to account for different cancer mortality pattern such as Ushaped.
Future methodological development can address the testing of other scenarios, designed to mirror a wider variety of cancers, exploring different survival distributions for the uncured and separately modelling the parameters in the distribution function, that is, the scale (λ) and shape (γ) parameters in the case of the Weibull distribution.
Previous applications of the same model to real data [17, 18] or of survival models not focused on cure [13,14,15] have shown that an increased patients’ mortality risk due to other causes exists for several cancers. Reasons for different ‘other causes of death’ risk in patients with respect to the general population may differ. They range from causes not generated by the cancer such as smoking, genetic and other lifestyle risk factors common to cancer and other diseases, as well as to deprivation and other socioeconomic factors, to factors indirectly caused by the diagnosed cancer such as side effects of cancer treatment [29, 30]. We cannot separate these two components without detailed treatment data from clinical databases. However, our intention was to capture cure of the cancer itself, implying no further risk of progression and death due to the diagnosed cancer and following the definition proposed by Haupt in 2007 “cure from the original cancer regardless of any potential for, or presence of, remaining disabilities or side effects of treatment” [31]. Following this definition we distinguish from patients that will die for progression or relapse of the diagnosed cancer and those who will die from other causes related (e.g. adverse effects of treatments) or not (e.g. other disease) with the cancer [32] .
This increased risk of death is greater in patients aged 65 or more. By applying our model to colon cancer data, we showed that not acknowledging an increased risk of death led to substantial bias in relative survival estimations and cure fractions. Even a moderately increased risk of 1.3 can make a greater difference than, for example, the one induced by different choices of the relative survival estimator, an issue that has raised considerable debate within the biostatistics community [33,34,35].
Concerning applications, the model provides a new indicator in cancer patients: the relative risk of death due to other causes. Estimating and accounting for variations in this indicator over time, in different populations, or according to stage or treatment could potentially improve the validity of survival comparisons. Moreover, we believe that providing correct estimates of cure fractions and of survival in uncured patients is important in that it could be used to improve the planning of health services for cancer survivors, particularly when a large increase in longterm cancer prevalence occurs or is expected. For instance, the estimated increased risk of dying of 1.3% in patients cured for colon cancer indicates that longterm clinical followup should focus more on preventing or treating the longterm effects of surgery and chemotherapies and addressing risk factors for cancer that are shared with other chronic diseases. To be more effective in preventing or reducing the increased risk, the same services that provide the cancer clinical followup have to take care of the conditions responsible for the increased risk estimated by this study, particularly in the elderly.
Correct estimates of time to cure, derived from the cure model parameters, also have practical implications in the legislation addressing the “Right to be Forgotten” of cancer patients https://ecpc.org/policy/therighttobeforgottenanewresearchproject. Such legislation has been introduced in France, Belgium, Luxembourg and the Netherlands, with efforts being made across Europe to establish similar approaches. Finally, a more accurate balance between the risk of death due to cancer and that due to other, often more effectively controlled, causes can improve patients’ awareness and quality of life.
Conclusion
The present analysis supports the use of the corrected mixture cure model including the increased risk of noncancer death, to provide better estimates of indicators based on cancer survival, which are important to public health decisionmaking and should improve patients’ awareness of their health status and facilitate their return to normal life.
Availability of data and materials
Data availability statement
The datasets used and/or analysed during the current study are available from the corresponding author upon reasonable request.
The French colon cancer data that support the findings of this study are available from Francim but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with the permission of Francim.
Code availability
Details of the Stata code and RPackage curesurv used to fit the proposed corrected mixture cue model are available in the Github and Zenovo repository. The repository is called curesurvTools (https://github.com/LauraBotta/curesurvTools and https://zenodo.org/record/7567886).
Please cite this material as in reference [36].
The code including the unweighted least square estimation of Model(1) parameters, is available from the first author. The results for the latter method are close but a jackknife technique was required to obtain good coverage.
Abbreviations
 SEER:

Surveillance, Epidemiology, and End Results Program
 RS:

Relative survival
 FRANCIM:

French network of cancer registries
 INSEE:

French National Statistics Institute
 AB:

Absolute Bias
 SD:

Standard Deviation
 CVR:

Coverage
 P_{AC} :

Probabilities of administrative censoring
 P_{LF} :

Probabilities of loss to followup
References
Aziz NM. Cancer survivorship research: state of knowledge, challenges and opportunities. Acta Oncol. 2007;46:417–32.
Rugbjerg K, Mellenkiaern L, Boice JD, et al. Cardiovascular diseases in survivors of adolescent and young adult cancer: a Danish cohort study, 1943–2009. J Natl Cancer Inst. 2014;106:dju110.
Armenian SH, Xu L, Ky B, et al. Cardiovascular disease among survivors of adultonset cancer: community based retrospective cohort study. J Clin Oncol. 2016;34:1122–30.
Hinchliffe SR, Rutheford MJ, Crowter MJ, Nelson CP, Lambert PC. Should relative survival be used with lung cancer data. Br J Cancer. 2012;106:11854–9.
Capocaccia R, Gatta G, Dal Maso L. Life expectancy of colon, breast, and testicular cancer patients: an analysis of USSEER populationbased data. Ann Oncol. 2015;26:1263–8.
Pohar Perme M, Estève J, Rachet B. Analysing populationbased cancer survival – settling the controversies. BMC Cancer. 2016;16:933.
Ederer F, Axtell LM, Cutler SJ. The relative survival rate: a statistical methodology. Natl Cancer Inst Monogr. 1961;6:101–21.
Hakulinen T. On longterm relative survival rates. J Chronic Dis. 1977;30:431–43.
Esteve J, Benhamou E, Croasdal N, Raymond L. Relative survival and the estimation of net survival : elements for further discussion. Stat Med. 1990;9:529–38.
Verdecchia A, De Angelis R, Capocaccia R, Sant M, Micheli A, Gemma Gatta G, et al. The cure for colon cancer: results from the Eurocare study. Int J Cancer. 1998;77:322–9.
Yu XQ, De Angelis R, Andersson TML, Lambert PC, O'Connell DL, Dickman PW. Estimating the proportion cured of cancer: some practical advice for users. Cancer Epidemiol. 2013;37(6):836–42.
Dumas A, Allodji R, Fresneau B, et al. The right to be forgotten: a change in access to insurance and loans after childhood cancer? J Cancer Surviv. 2017;11(4):431–7.
Goungounga JA, Touraine C, Grafféo N, Giorgi R. CENSUR working survival group. Correcting for misclassification and selection effects in estimating net survival in clinical trials. BMC Med Res Methodol. 2019;19(1):104.
Touraine C, Graffeo N, Giorgi R. More accurate cancerrelated excess mortality through correcting background mortality for extra variables. Stat Methods Med Res. 2020;29(1):122–36.
Rubio FJ, Rachet B, Giorgi R, Maringe C, Belot A. On models for the estimation of the excess mortality hazard in case of insufficiently stratified life tables. Biostatistics. 2021;22(1):51–67.
Mba RD, Goungounga JA, Graffeo N, Giorgi R. CENSUR working survival group. Correcting inaccurate background mortality in excess hazard models through breakpoints. BMC Med Res Methodol. 2020;20(1):268.
Phillips N, Coldman A, McBride M. Estimating cancer prevalence using mixture models for cancer survival. Stat Med. 2001;21:1257–70.
Botta L, Gatta G, Trama A, Capocaccia R. Excess risk of dying of other causes of cured cancer patients. Tumori J. 2019;105(3):199–204.
Lambert PC, Thompson JR, Weston CL, et al. Estimating and modelling the cure fraction in populationbased cancer survival analysis. Biostatistics. 2007;8:576–94.
Dickman PW, Sloggett A, Hills M, Hakulinen T. Regression models for relative survival. Stat Med. 2004;23:51–64.
Romain G, Boussari O, Bossard N, Remontet L, Bouvier AM, Mounier M, et al. French network of Cancer registries (FRANCIM). Timetocure and cure proportion in solid cancers in France. A population based study. Cancer Epidemiol. 2019;60:93–101.
Institut National de la Statistique et des Études Économiques. https://www.ined.fr/fr/toutsavoirpopulation/chiffres/france/mortalitecausedeces/tablemortalite/. Accessed 6 Mar 2023.
Dal Maso L, Panato C, Tavilla A, et al. Cancer cure for 32 cancer types: results from the EUROCARE5 study. Int J Epidemiol. 2020;49(5):1517–25.
Mudholkar GS, Srivastava DK, Kollia GD. A generalization of the Weibull distribution with application to the analysis of survival data. J Am Stat Assoc. 1996;91:1575–83.
Cvancarova M, Aagnes B, Fosså SD, Lambert PC, Møller B, Bray F. Proportion cured models applied to 23 cancer sites in Norway. Int J Cancer. 2013;132(7):1700–10. https://doi.org/10.1002/ijc.27802 Epub 2012 Dec 14. PMID: 22927104.
Boag JW. Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J Roy Stat Soc. 1949;11:15–44.
Mariotto AB, Zou Z, Zhang F, Howlader N, Kurian AW, Etzioni R. Can we use survival data from cancer registries to learn about disease recurrence? The case of breast cancer. Cancer Epidemiol Biomarkers Prev. 2018;27(11):1332–41. https://doi.org/10.1158/10559965.EPI171129 Epub 2018 Oct 18. PMID: 30337342; PMCID: PMC8343992.
Andersson TM, Dickman PW, Eloranta S, Lambert PC. Estimating and modelling cure in populationbased cancer studies within the framework of flexible parametric survival models. BMC Med Res Methodol. 2011;11:96. https://doi.org/10.1186/147122881196 PMID: 21696598; PMCID: PMC3145604.
Bright CJ, Brentnall AR, Wooldrage K, et al. Errors in determination of net survival: causespecific and relative survival settings. Br J Cancer. 2020;122:1094–101.
Botta L, Gatta G, Capocaccia R, Stiller C, Cañete A, Dal Maso L, et al. Longterm survival and cure fraction estimates for childhood cancer in Europe (EUROCARE6): results from a populationbased study. Lancet Oncol. 2022;23(12):1525–36. https://doi.org/10.1016/S14702045(22)006374 Epub 2022 Nov 16. PMID: 36400102.
Haupt R, Spinetta JJ, Ban I, et al. Long term survivors of childhood cancer: cure and care. The Erice statement. Eur J Cancer. 2007;43(12):1778–80.
Ellis L, Coleman MP, Rachet B. The impact of life tables adjusted for smoking on the socioeconomic difference in net survival for laryngeal and lung cancer. Br J Cancer. 2014;111(1):195–202. https://doi.org/10.1038/bjc.2014.217 Epub 2014 May 22. PMID: 24853177; PMCID: PMC4090723.
Roche L, Danieli C, Belot A, et al. Cancer net survival on registry data: use of the new unbiased PoharPerme estimator and magnitude of the bias with the classical methods. Int J Cancer. 2013;132:2359–69.
Dickman PW, Lambert PC, Coviello E, Rutherford MJ. Estimating net survival in populationbased cancer studies. Int J Cancer. 2013;133:519–22.
Seppa K, Hakulinen T, Pokhrel A. Choosing the net survival method for cancer survival estimation. Eur J Cancer. 2015;51(9):1123–9. https://doi.org/10.1016/j.ejca.2013.09.019. Epub 2013 Oct 29.
Botta L, Goungounga J, Capocaccia R, Romain G, Boussari O, Jooste V. LauraBotta/curesurvTools:v1.0(v1.0): Zenodo; 2023. https://doi.org/10.5281/zenodo.7567886.
Acknowledgments
We thank Luigino Dal Maso and the reviewers for providing helpful comments to the layout of the manuscript and Philip Bastable for the revision of the English.
Funding
The work was partially supported by the French Institut National du Cancer (INCa grant number 2018–178) and the European Union (Fonds Européen de Développement Regional: FEDER grant number BG0028239).
Author information
Authors and Affiliations
Contributions
LB, JG, VJ, OB and RC designed the study, contributed to methods and materials, analysis, interpretation of the findings. LB and RC drafted the article. JG, OB and VJ revised the manuscript. GG and MC contributed to the interpretation of the findings and revised the manuscript. GR, LB, RC, JG and OB carried out the programming for the simulation study. LB and RC carried out the programming for the ML estimation in Stata. JG programmed curesurv in R, used in the application for reallife data. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The observational noninterventional survival study on raw FRANCIM data was approved Consultative Committee for the Processing of Health Research Data (CCTIRS), by the French Data Protection Authority (CNIL, authorization n° 041036) and in agreement with the French legislation, there was no requirement for written informed consent. Administrative permission to access data was granted by the FRANCIM committee in charge of data sharing and all registries approved the use of their data. All methods were carried out in accordance with relevant guidelines and regulations. FRANCIM data used in this study were anonymized before use.
Consent for publication
Not Applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: Supplemetary file Fig. 1.
Probability density according to different θ parameters representing different distribution of the uncured function.Two figures representing Lung and Breast scenario. Supplemetary file Table 1. Performance indicators using grouped data of Model(1) in situations when there was very small or no cure at all and a persisting longterm cancer mortality (g=1). The Model(1) was used with a logistic age effect on cured (as proposed all along the manuscript), a linear age effect on cured and without cure. 1,000 estimates each composed by 10,000 cases and 15 years of followup. Supplementary file Table 2. Robustness analysis using Maximum likelihood on individual data. a) The times to cancer death of uncured patients do not follow a Weibull distribution; b) The extra noncancer death risk is dependent of age at diagnosis; c) The extra noncancer death risk randomly varies among the patients. Supplementary file Table 3. Performance indicators for a and p using individual data according to sample size and length of followup. 1,000 estimates each composed by different sample size and followup length in years.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Botta, L., Goungounga, J., Capocaccia, R. et al. A new cure model that corrects for increased risk of noncancer death: analysis of reliability and robustness, and application to reallife data. BMC Med Res Methodol 23, 70 (2023). https://doi.org/10.1186/s1287402301876x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1287402301876x
Keywords
 Cure model
 Increased noncancer mortality
 Populationbased data
 Robustness
 Reliability
 Life tables
 Net survival
 Cancer