 Research
 Open Access
 Published:
Assessing the impact of including variation in general population mortality on standard errors of relative survival and loss in life expectancy
BMC Medical Research Methodology volume 22, Article number: 130 (2022)
Abstract
Background
A relative survival approach is often used in populationbased cancer studies, where other cause (or expected) mortality is assumed to be the same as the mortality in the general population, given a specific covariate pattern. The population mortality is assumed to be known (fixed), i.e. measured without uncertainty. This could have implications for the estimated standard errors (SE) of any measures obtained within a relative survival framework, such as relative survival (RS) ratios and the loss in life expectancy (LLE). We evaluated the existing approach to estimate SE of RS and the LLE in comparison to if uncertainty in the population mortality was taken into account.
Methods
The uncertainty from the population mortality was incorporated using parametric bootstrap approach. The analysis was performed with different levels of stratification and sizes of the general population used for creating expected mortality rates. Using these expected mortality rates, SEs of 5year RS and the LLE for colon cancer patients in Sweden were estimated.
Results
Ignoring uncertainty in the general population mortality rates had negligible (less than 1%) impact on the SEs of 5year RS and LLE, when the expected mortality rates were based on the whole general population, i.e. all people living in a country or region. However, the smaller population used for creating the expected mortality rates, the larger impact. For a general population reduced to 0.05% of the original size and stratified by age, sex, year and region, the relative precision for 5year RS was 41% for males diagnosed at age 85. For the LLE the impact was more substantial with a relative precision of 1286%. The relative precision for marginal estimates of 5year RS was 3% and 30% and for the LLE 22% and 313% when the general population was reduced to 0.5% and 0.05% of the original size, respectively.
Conclusions
When the general population mortality rates are based on the whole population, the uncertainty in the estimates of the expected measures can be ignored. However, when based on a smaller population, this uncertainty should be taken into account, otherwise SEs may be too small, particularly for marginal values, and, therefore, confidence intervals too narrow.
Background
To summarise cancer survival data various measures can be used. Within populationbased studies, most of these measures are estimated in a relative survival framework, where the sometimes inaccurate or unreliable information on cause of death is not required [1]. Here, the observed mortality rate of cancer patients theoretically consists of two components: the expected mortality rate and the excess mortality rate. The excess mortality rate represents the mortality rate due to the cancer of interest and the expected mortality rate is the mortality rate due to other causes. Relative survival (RS) ratios, which are the survival analogue of excess hazards, are commonly reported at a specific time after diagnosis, usually 1year, 5year or 10year relative survival. Under some assumptions RS can be interpreted as net survival [2, 3] i.e. the probability to survive if the cancer of interest was the only possible cause of death, and is useful for comparisons between groups where mortality rates due to other causes can vary. However, alternative measures that are interpreted in the presence of other causes of death are also useful. One such measure is the loss in life expectancy (LLE). The LLE is the difference in the life expectancy the cancer patients would have if they did not have cancer, and the life expectancy of the cancer population. The former life expectancy is usually assumed to be the same as the life expectancy in the general population (matched on factors like age, sex and calendar year). In comparison to RS, the LLE is defined in the "real world" since it takes into account the presence of other causes of death [4]. To estimate the LLE, the observed survival function often has to be extrapolated beyond available followup. It has been shown that the extrapolation performs better by extrapolating the expected and relative survival functions separately and using the interrelationship between observed, expected, and relative survival [4]. The LLE is therefore often estimated within a relative survival framework.
In practice, the expected mortality rates are usually obtained from population life tables stratified by some sociodemographic factors (such as age, sex, calendar year) and are considered known or fixed, i.e. measured without uncertainty. The argument behind this is that since the rates are based on the whole population, any uncertainty in the estimates is assumed negligible, especially in relation to the uncertainty from a considerably smaller cancer cohort. However, the mortality rates in the general population can be seen as one possible realization of the mortality rates. Even though one study showed fixed expected mortality rates to be a valid assumption for the estimation of RS [5], it might not be the case if the life tables are stratified on many variables or are based on small regions. Also, there might be situations where life tables are not available, but can be constructed from a random sample from the general population. When estimating the LLE, incorporation of uncertainty of expected mortality rates might be more important, since the expected mortality rates are included in several parts of the estimation, namely, the estimation of life expectancy in the general population and life expectancy of the cancer patients, which in turn, is estimated using the expected mortality rates and excess mortality rates.
The aim of the study was to evaluate the existing approach of estimating standard errors (SE) of RS and the LLE in comparison to if uncertainty in the expected mortality is taken into account. This is illustrated using data on colon cancer in Sweden via estimation of both marginal and conditional measures of the 5year RS and the LLE. We use a parametric bootstrap approach to incorporate the uncertainty from the expected mortality. To investigate possible drivers of differences, we perform the analysis with different levels of stratification and sizes of the general population used for creating expected mortality rates.
Material & methods
Background
Relative survival
The mortality rate among cancer patients can be separated into two parts, the mortality rate due to the cancer of interest and the mortality rate due to other causes. In a relative survival framework where the information about the cause of death is not required, mortality due to the cancer of interest is estimated as the excess mortality among the cancer patients compared to the expected mortality in the absence of cancer. The expected mortality is based on the mortality in the general population, and it is assumed that the othercause mortality among the cancer patients is the same as the general population mortality, matched on age, sex, calendar year and possibly other covariates. Thus, the excess mortality among cancer patients λ(tZ_{1}) can be written as:
where t represents time since diagnosis, h(tZ) is the allcause mortality rate among the cancer patients and h^{∗}(tZ_{2}) is the expected mortality. Z denotes a set of all covariates, while Z_{1} and Z_{2} present the covariates for excess and expected mortality respectively. The expected mortality rates are usually assumed to be known and obtained from available life tables.
After transforming mortality rates to the survival scale, relative survival (RS(t)) is defined as the ratio of allcause survival (S(t)) and expected survival (S^{∗}(t)):
RS is a common summary measure of cancer patients’ survival presented by national cancer registries, and is often interpreted as net survival. For RS to be interpreted as net survival, i.e. survival from cancer if there were no other possible causes of death, the assumption of exchangeability between the general population and cancer cohort must hold, i.e. the mortality in the general population must be the same as the mortality the cancer patients would have had if they did not have cancer. The other assumption is conditional independence, i.e. all the factors affecting both the cancerspecific and othercause mortality must be controlled for [2, 3]. RS can be estimated using several approaches, both nonparametric [1, 6] and parametric [7, 8]. In this work we chose a flexible parametric survival model (FPM) within a relative survival framework [9] to model the log cumulative excess hazard. The log cumulative excess hazard within a FPM is expressed as:
where Λ(tZ_{1}) is the cumulative excess hazard, s(ln(t)γ,k_{0}) is a restricted cubic spline function of ln(t) used to estimate the baseline log cumulative excess hazard [10]. The model (3) is a proportional excess hazards model but it can be easily extended to non proportional hazards by incorporating time dependent effects. This can be done by forming interactions between the covariates of interest and the spline terms for time [9].
Based on model (3) and the general relationship between the cumulative hazard function and the survival function, RS(t) can be obtained by
The loss in life expectancy
The loss in life expectancy (LLE) is the difference between life expectancy in the general population, free from the cancer of interest, LE_{P}, and the life expectancy in the cancer population, LE_{C}:
LE_{C} can be calculated as the area under allcause survival curve S(t):
Similarly, LE_{P} equals the area under general population survival or expected survival S^{∗}(t):
Thus, LLE can be written as:
Assuming that the cancer patients would have had the same life expectancy as the general population, had they not been diagnosed with cancer, LLE estimates the number of years the life expectancy is reduced due to cancer. Theoretically, LLE is easy to estimate (Eq. (5)), however, in practice the estimation often requires extrapolation of the survival functions due to limited followup. It has been shown that extrapolation of the allcause survival curve S(t) is preferably performed by breaking it into two components: relative survival RS(t) and expected survival S^{∗}(t) [4], and extrapolating the functions separately. As a result, LLE is estimated by:
where expected survival S^{∗}(t) is obtained using population life tables and RS(t) is obtained from a FPM (Eq. (4)),
Marginal measures
For populationbased cancer survival, interest often lies in obtaining an average estimate (a single number) for RS or LLE, across the covariate distribution. In other words, we are interested in marginal estimands, which can be estimated using regression standardization [11].
Marginal relative survival (RS_{m}(t)) is defined as the expectation over the distribution of covariates Z_{1} and can be estimated by predicting relative survival for all individuals in the cancer population at time t after diagnosis and averaging them [12]:
where \(\widehat {RS}_{i}(tZ_{1i})\) is the predicted RS for individual i at time t, Z_{1i} is the covariate pattern for individual i associated with the excess mortality, and N is the number of all individuals in the cancer population.
Analogically to marginal RS, marginal loss in expectation of life (LLE_{m}) is estimated by taking the average over the predicted LLE of all individuals in the data set:
Variance estimation
The model parameters in Eq. (3) are obtained using maximum likelihood, assuming that the expected mortality is fixed (i.e. measured without uncertainty). Therefore, the variance in the estimation of the log cumulative excess hazard is based solely on the cancer cohort data. RS is estimated from the model as shown in Eq. (4), and the SE of RS is obtained using the delta method. Confidence intervals (CI) of RS are first obtained on the log cumulative excess hazard scale (the scale we are modelling on), and then transformed to the survival scale (using Eq. (4)). The variance of LLE is also based on the delta method, where the uncertainty solely comes from the estimation of excess mortality. Therefore, the assumption that the expected mortality is measured without uncertainty is used three times for LLE. First in the estimation of RS(t) using a FPM, then when multiplying RS(t) with the expected survival S^{∗}(t) when obtain the life expectancy among the cancer patients LE_{C}, and lastly by taking the life expectancy among the general population LE_{P} as a constant. The delta method is used to obtain the variance of marginal measures as well.
Population mortality rates
The general population mortality used for the expected mortality rate h^{∗}(t) and the expected survival S^{∗}(t) for the estimation of RS and LLE are often obtained from statistics bureaus and presented on a national level. In other words, the estimates of the general population mortality are based on the whole population, i.e. all people living in a country or region, that is the catchment area for the populationbased cancer registry. In this study the population mortality rates are based on all people living in Sweden. The fact that h^{∗}(t) and S^{∗}(t) are based on the whole population is the reason why they are assumed fixed, and measured without uncertainty. However, this assumption might be questioned because h^{∗}(t) and S^{∗}(t) can be seen as one potential realization from a random process. Also, there might be scenarios when uncertainty from the expected mortality rates should be taken into account. For instance, when one wants to use population mortality rates stratified by many covariates. Often, the population mortality rates are stratified by age, sex and calendar year. However, the expected mortality can also differ across regions or for various socioeconomic status. Then, the population mortality rates will also be stratified by region, socioeconomic status or other covariates. Consequently, when many stratified variables are employed, the number of people in each stratified cell can be very small and thus, ignoring uncertainty in the expected mortality rates might be inaccurate. There are, in addition, scenarios when one would like to stratify the expected mortality rates by factors which are not available on a national level. If the data for the whole population are not available, expected rates can be constructed based on a random sample of individuals from the whole population, where information on the missing variables is available [13].
Material
Cohort data
Sweden has a population size of approximately 10,000,000, and all cancer cases are reported to the Cancer Register. In this study, we used data from the Swedish Cancer Registry to identify patients diagnosed with colon cancer in Sweden in 2006. Only cases aged 50 and older at diagnosis were included, and cases diagnosed at autopsy were excluded. In total, 3400 patients were included in this study. The patients were followed from diagnosis to death due to any cause or the end of 2017, whichever came first. A 10% random sample from the cancer cohort (318 patients) was also created to be able to investigate how the estimates could be affected in a smaller population, for example a smaller country or region.
This study was approved by the Swedish Ethical Review Authority. Informed consent from study subjects was not required for the current study. This study was carried out in accordance with the Declaration of Helsinki, and all methods were carried out in accordance with relevant guidelines and regulations in Sweden.
General population data
We used two data sets that contained the number of deaths and personyears in the Swedish population, obtained from Statistics Sweden [14]. The first data set was stratified by sex, yearly age from 18 to 99 and calendar year from 1975 to 2017. We denote it popmort. The second dataset was stratified by an additional factor, region, and we denote it popmort_region. Sweden is divided into 21 regions, the largest being Stockholm with a population size of approximately 2,300,000 and the smallest is Gotland with a population size of approximately 60,000. Both popmort and popmort_region are based on the whole general population, i.e. all people living in Sweden. We refer to them as population mortality files with original size. As mentioned above, there are scenarios when the expected rates might not be based on the whole population. To address this, we created population mortality files based on the datasets obtained from Statistics Sweden, but reduced in size. To do this popmort and popmort_region were reduced in size by a factor of 10, 200 and 2000 (both the number of deaths and personyears were divided by 10, 200 and 2000). Thus, we obtained population mortality files with a population of 1 million, 50,000 and 5,000 people, which corresponds to 10%, 0.5% and 0.05% of the original size. This gave us in total 8 different versions of general population mortality data.
Underlying and varying population mortality rates
To compare the standard errors of RS and LLE estimated with the existing approach, and an approach which takes the uncertainty in the general population mortality into account, we need both fixed, or underlying, population mortality rates and varying population mortality rates. To obtain these mortality rates we fitted Poisson regression models to the two original data sets of population death counts described above. The models included the covariates sex, age and year with the log personyears as an offset. Age and year were treated as continuous variables modeled using restricted cubic splines. In addition, the pairwise interaction terms of age, gender and year were included into the model to allow for differential effects across groups. For the general population mortality including region, separate models were fitted to each region. The predictions from the Poisson models were used as underlying expected mortality rates. Thus, two obtained files were used as data sets with underlying expected mortality rates in the analysis, with and without stratification by region. The reason why the underlying mortality rates were constructed from modeling, instead of directly using the data for each covariate pattern, was to make them comparable to the varying mortality rates. However, the modelled rates were close to the original mortality rates obtained from Statistics Sweden. To be able to get varying expected mortality rates, bootstrapping from the models described were used, as outlined below.
To incorporate uncertainty of the expected mortality in the estimation of the standard errors of the 5year RS and LLE we created a set of 1000 realizations of the expected mortality rates and expected survival probabilities. To do this, we used a parametric bootstrap. For each of the 8 different versions of the general population mortality data abovementioned (with a different size of the source population and with / without stratification by region) we fitted the Poisson model described above. Since smaller general populations were created by dividing the number of deaths and personyears by a corresponding factor, we obtained the same \(\widehat {\beta }\) coefficients but different variancecovariance matrices \(\widehat {\Sigma }\) from the Poisson model in all underlying population sizes. New β parameters were drawn 1000 times from a multivariate normal distribution using the vector of \(\widehat {\beta }\) coefficients and the variancecovariance matrix \(\widehat {\Sigma }, N(\widehat \beta, \widehat \Sigma)\) from the Poisson model [15]. For each draw, the expected mortality rates were obtained from the generated β parameters. These 1000 imputed data sets were used as the replicates of the underlying expected mortality rates, resulting in popmort_{1} popmort_{1000} varying expected mortality rates for each of 8 population mortality files (with / without region and 4 sizes of the population).
Methods
Conventional estimates
To obtain conventional estimates, namely, the estimates with the approach assuming no uncertainty in the expected mortality rates, FPMs within a relative survival framework (as shown in Eq. (3)) were fitted. We investigated 4 different settings. For settings 1 and 2, the full cancer cohort and mortality rates based on the general population of original size were used. For setting 1 population mortality rates were stratified by age, year and sex, while for setting 2 population mortality rates were stratified by age, year, sex and region. Within settings 1S and 2S the cancer cohort that was reduced to 10% of the original size was used to investigate what would be observed in a smaller population (here S stands for small). In these settings, the population mortality rates based on the population reduced to 10% of the original size were used. Similarly to settings 1 and 2, for setting 1S the population mortality rates were stratified by age, year and sex, and for setting 2S also by region.
To obtain the conventional estimates for each of these settings, 4 FPMs were fitted. Age at diagnosis and sex were included in the models, and the timescale being time from diagnosis. Region was not included into the model, i.e. it was assumed that the excess mortality is the same in all regions. The expected mortality, however, depends on region if the underlying expected mortality rates are stratified by region and does not, otherwise. The log baseline cumulative excess hazard was estimated using restricted cubic splines with 5 degrees of freedom (df). Age was included as a continuous variable using restricted cubic splines with 4 df. Furthermore, restricted cubic splines with 3 df were used to capture the timevarying effect of age and sex. Within all fitted models, the expected mortality rates were assumed to be fixed.
Based on these 4 FPMs, the 5year RS, LLE and their SEs were estimated for each of the 4 settings of interest. In the estimation of LLE, we used the same population mortality file as used in the modelling step, again assuming known expected mortality rates, i.e. by using the underlying expected mortality rates as described above. 5year RS by age and sex, as well as marginal 5year RS were estimated. Since the LLE depends on not only the excess mortality but also the life expectancy in the general population, it will also vary by the factors that the expected mortality rates are stratified by. Thus, with regionspecific expected mortality rates the LLE was obtained by age, sex and region. Marginal LLE was also estimated.
Variance estimation including uncertainty in the expected mortality
To estimate the SEs of the 5year RS and LLE when the variation in the expected mortality is taken into account, the FPMs described above were also modeled using the 1000 varying expected population mortality rates obtained with parametric bootstrap. For each of the abovementioned settings 1 and 2 we investigated 4 scenarios using different sizes of the general population as described previously. For the settings 1S and 2S only one scenario each was employed. This gave in total 10 different scenarios that were studied, and these are summarized in Table 1. Therefore, for each scenario from Table 1 we fitted 1000 FPMs (using each of the 1000 varying expected mortality rates) and obtained the 5year RS, LLE and their SEs estimates for each and every model in the same way as the conventional estimates.
As described above, to include the uncertainty of the expected mortality rates in the estimates of RS and LLE, for each of 10 scenarios from Table 1, we fitted the FPMs 1000 times, using each of 1000 replicates of the underlying expected mortality rates for that specific scenario. Each time, the conditional 5year RS and LLE, and marginal RS and LLE, and their SEs were obtained. Finally, using Rubin’s rules [16] the estimates were combined to derive the pooled estimates and standard errors.
For estimates of LLE, the pooled mean was estimated as:
and the pooled variance as
where V_{W} is within imputation variance, V_{B} is between imputation variance and M is the number of the imputed data sets.
where \(\widehat {SE}_{i}\) is a standard error for \(\widehat {LLE}_{i}, i= 1,..., M\)
The above equations show the marginal estimates, however, the same approach was used for conditional estimates. The estimates for 5year RS were obtained in the same way. These estimates are denoted by estimates obtained with a bootstrapbased method.
Performance measure
To compare the standard errors from the bootstrapbased methods to the conventional method the relative % precision (RP) [17] was calculated by:
where \(\widehat {SE}_{boot}\) and \(\widehat {SE}_{conv}\) are estimated SEs of 5year RS or LLE, obtained with the bootstrapbased and conventional methods, respectively.
The analysis was performed with Stata 15.1 software packages stpm2 and standsurv available publicly [8, 18, 19].
Results
Conventional setting 1
The point estimates (PE) of 5year relative survival and loss in life expectancy by selected ages at diagnosis (55, 65, 75, 85) and sex are presented in Table 2 for scenarios AD (using population mortality rates stratified by age, sex and calendar year), as well as for the corresponding conventional setting 1. The SEs, CIs and RP for each of the estimates are also shown. It can be seen that SEs obtained with a bootstrapbased method are larger than conventional SEs for scenario D (when the size of the general population is reduced to 0.05% of the original size). For 5year RS, this increase is noticeable for patients older than 75 years, while for LLE changes are seen for all ages. In addition, the increase is larger for LLE than for RS. For example, the relative precision of 5year RS for scenario D for males aged 75 is approximately 15%, while the RP of LLE in the same scenario is approximately 221%.
Conventional setting 2
Table 3 illustrates estimates for selected ages at diagnosis (55, 65, 75, 85) and by sex from scenarios FI (using population mortality rates stratified by age, sex, calendar year and region), as well as from the corresponding conventional setting 2. Since the estimates of LLE also differ by region in these scenarios, the results are shown for the Stockholm region. Similar patterns to scenarios AD can be seen here. The bootstrapbased SEs are larger than the conventional SEs for scenario I (when the expected mortality is based on a population reduced to 0.05% of the original size). The increase in the SEs of LLE estimates can also be observed with scenario H (the general population is reduced to 0.5% of the original size). For example, the SE of LLE for males aged 55 is 0.87 for conventional method, scenarios F and G (the general population of original size and reduced to 10% of the original size, respectively), while for scenario H (the size of the general population is 0.5% of the original size) the SE of LLE is 0.94 and 1.47 for scenario I (when the general population is 0.05% of the original size). Similar to the estimates from scenarios AD, the increase is larger for LLE than for RS. In addition, the RP of LLE is larger than the RP of LLE in setting 1 (using population mortality rates stratified by age, sex and calendar year). For example, for men aged 75 years in setting 1, scenario D, the RP of LLE is about 221%, while in setting 2, scenario I, the RP is 811%. The same pattern is seen for the smallest region in Sweden, the Gotland region, although the RP is much higher for the Gotland region than for the Stockholm region. The results for the Gotland region can be found in Additional file 1 for scenarios FH, and Additional file 2 for scenario J.
Conventional settings 1S and 2S
For the scenarios where the cancer cohort is reduced in size (scenarios E and J) an inflation in SEs was not observed, for either 5year RS or LLE, regardless whether the population mortality rates were stratified by region or not (Table 4).
Confidence intervals
CIs of 5year RS and LLE for each of the 10 scenarios AJ, for males and by selected ages at diagnosis (55, 65, 75, 85) are illustrated in Figs. 12. Visually, differences in the length of Cis of 5year RS can be observed only for scenarios D and I. For the length of CIs of LLE differences are seen for scenarios D, I and H.
Graphical comparisons of the bootstrapbased and conventional estimates of 5year RS, LLE and their CIs from each of the 10 scenarios AJ, for males aged 50+ are found in Additional file 3.
Marginal estimates
Marginal measures are presented in Table 5 for each of the 10 scenarios from Table 1. We can here observe similar patterns to conditional results presented in Tables 2, 3 and 4. For instance, for scenarios E and J (using the reduced cancer cohort), there is no inflation in SEs of marginal 5year RS or LLE. Also, changes in SEs of marginal 5year RS can be seen in scenarios D and I (the general population is 0.05% of the original size), for LLE these changes are observed also for scenarios C and H (the general population is 0.5% of the original size).
Discussion
In this study we found that when the whole general population, i.e. all people living in a country or region, that is the catchment area for the populationbased cancer registry, is used to get predicted mortality rates for estimating 5year RS or the LLE, the assumption of known (fixed) general population mortality rates has a negligible effect on the estimates. The relative precision for both 5year RS and LLE was less than 1%. This is an important message for population based cancer research.
The impact of including the uncertainty in expected mortality was larger when the population mortality was stratified on more variables, here region. However, the impact was still small when the mortality rates were based on the whole population. The largest relative precision for 5year RS was 0.03% and for the LLE it was 0.46%. Interestingly, it did not make a large difference when we assumed that the cancer cohort was only 10% of the original size and the corresponding reduced general population was used, as would be the case in a smaller country or a region. The relative precision in this case for 5year RS was in the range of 0.01% to 0.03%, and less than 1% for LLE. This suggests that as long as the whole general population is used, regardless of the size of a country or region a possible variation in the expected mortality rates can be ignored.
If the whole population of the country or region is not available, then the validity of the assumption of known expected mortality rates should be discussed. In the study we illustrated that for 5year RS when the general population was reduced to 0.05% of the original size and stratified by age, sex and calendar year, the relative precision was 15 (7)% and 40 (34)% for males (females) 75 and 85 years old, respectively. The LLE was affected to a larger extent than RS. For all ages the increase in SE of the LLE was observed when the general population was reduced to 0.5% and 0.05% of the original size. The relative precision for the LLE when the general population was reduced to 0.05% and stratified by age, sex and calendar year was 221 (129)% and 245 (176)% for males (females) 75 and 85 years old, respectively. For estimates in older ages the impact was larger possibly because the expected mortality rates of elderly patients in the general population are more influential than for younger patients and, therefore, the uncertainty introduced in the general mortality could have a larger impact on SEs of both 5year RS and LLE. For marginal 5year RS the relative precision was 3% when the general population was reduced to 0.5% of the original size and 30% when reduced to 0.05% of the original size. Similar to abovedescribed conditional estimates, the marginal estimates of the LLE showed larger relative precision than the marginal 5year RS with 22% and 313% for the general population reduced to 0.5% and 0.05% of the original size, respectively.
Previous work in this area has focused on nonparametric estimates of RS, and the results were similar to our results [5]. Another study did not address the uncertainty in the general population mortality rates, but investigated the impact on SE of nonparametric estimates of RS when allowing the expected survival for the cancer cohort to vary [20]. Nonparametric bootstrap was used to sample from the cancer cohort, resulting in a different age and sex distribution in each sample. Hence, the expected survival calculated for the nonparametric estimate of RS varied in each sample. However, it was still assumed that the general population mortality used to obtain this expected survival was fixed.
Even though the results of our study suggest that the assumption of known expected mortality rates is reasonable when based on the whole population, we did not investigate all possible situations. There might be situations when the general population mortality rates are stratified on even more covariates, leading to very small groups. Another aspect we did not include is the situation when more covariates are included in both the excess and expected mortality. We assumed that region did not have an impact on excess mortality, even when region was included for the expected mortality. Also, it would be of interest to elaborate on the findings using data on other cancer types. In addition, we used a modelling approach to obtain smooth estimates of the general population mortality rates, instead of using the raw numbers of deaths and personyears in each strata. An alternative way to create varying population mortality rates could be bootstraping from raw numbers of the number of deaths and personyears.
In conclusion, this study contributes to populationbased cancer studies suggesting that in general SE of RS and the LLE give reliable estimates with assumption of known expected mortality rates. However, when the general population mortality rates are not based on the whole population, the uncertainty in the estimates of the expected measures should be taken into account as the conventional estimates of SE for relative survival proportions and loss in life expectancy may be too low, particularly for marginal values.
Availability of data and materials
The data used for this study may not, according to the ethical permission granted for its use, be shared by the authors to a third party. It is accessible by application to the Swedish authorities (The Swedish Cancer Registry).
Abbreviations
 RS:

relative survival
 LLE:

loss in life expectancy
 SE:

standard error
 FPM:

flexible parametric survival model
 CI:

confidence interval
 df:

degree of freedom
 RP:

the relative % precision
 PE:

point estimate
References
Dickman PW, Coviello E. Estimating and modeling relative survival. Stata J. 2015; 15(1):186–215.
Pavlic K, Pohar Perme M. Using pseudoobservations for estimation in relative survival. Biostatistics. 2019; 20(3):384–99. https://doi.org/10.1093/biostatistics/kxy008.
Perme MP, Stare J, Esteve J. On estimation in relative survival. Biometrics. 2012; 68(1):113–20. https://doi.org/10.1111/j.15410420.2011.01640.x.
Andersson TML, Dickman PW, Eloranta S, Lambe M, Lambert PC. Estimating the loss in expectation of life due to cancer using flexible parametric survival models. Stat Med. 2013; 32(30):5286–300. https://doi.org/10.1002/sim.5943.
Gauffin O. Confidence intervals in relative survival. Masters thesis, Stockholm University. 2017. https://mathstatmast.files.wordpress.com/2017/05/2017_6_report.pdf. Accessed 10 Jan 2019.
Perme MP, Pavlic K. Nonparametric relative survival analysis with the r package relsurv. J Stat Softw. 2018; 87(8):1–27.
Dickman PW, Sloggett A, Hills M, Hakulinen T. Regression models for relative survival. Stat Med. 2004; 23(1):51–64. https://doi.org/10.1002/sim.1597.
Lambert PC, Royston P. Further Development of Flexible Parametric Models for Survival Analysis. Stata J Promot Commun Stat Stata. 2009; 9(2):265–90. https://doi.org/10.1177/1536867X0900900206.
Nelson CP, Lambert PC, Squire IB, Jones DR. Flexible parametric models for relative survival, with application in coronary heart disease. Stat Med. 2007; 26(30):5486–98. https://doi.org/10.1002/sim.3064.
Durrleman S, Simon R. Flexible regression models with cubic splines. Stat Med. 1989; 8(5):551–61. https://doi.org/10.1002/sim.4780080504.
Sjolander A. Regression standardization with the r package stdreg. Eur J Epidemiol. 2016; 31(6):563–74. https://doi.org/10.1007/s1065401601573.
Syriopoulou E, Rutherford MJ, Lambert PC. Marginal measures and causal effects using the relative survival framework. Int J Epidemiol. 2020; 49(2):619–28. https://doi.org/10.1093/ije/dyz268.
Bower H, Andersson TM, Crowther MJ, Dickman PW, Lambe M, Lambert PC. Adjusting expected mortality rates using information from a control population: An example using socioeconomic status. Am J Epidemiol. 2018; 187(4):828–36. https://doi.org/10.1093/aje/kwx303.
SCB (Statistic Sweden). Deaths by region, age (during the year) and sex. 2020. http://www.statistikdatabasen.scb.se/pxweb/en/ssd/START__BE__BE0101__BE0101I/DodaHandelseK/#. Accessed 05 Oct 2020.
Bickel PJ, Doksum KA. Mathematical Statistics: Basic Ideas and Selected Topics.CRC Press; 2015.
Rubin DB. Multiple Imputation for Nonresponse in Surveys. New York: Wiley; 2004.
Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019; 38(11):2074–102. https://doi.org/10.1002/sim.8086.
StataCorp. Stata Statistical Software: Release 15. College Station: StataCorp LLC; 2017.
Lambert PC. Standsurv. 2019. https://pclambert.net/software/standsurv/. Accessed 20 Oct 2020.
Brenner H, Hakulinen T. Substantial overestimation of standard errors of relative survival rates of cancer patients. Am J Epidemiol. 2005; 161(8):781–6. https://doi.org/10.1093/aje/kwi099.
Acknowledgements
Not applicable.
Funding
This work was funded via the Swedish Cancer Society (Cancerfonden) (grant numbers: 19 0102 Pj, 2018/744), the Swedish Research Council (Vetenskapsrådet) (grant numbers: 201901965, 201900227, 201701591) and the Strategic Research Area (SFO) in Epidemiology at Karolinska Institutet. Open access funding provided by Karolinska Institute.
Author information
Affiliations
Contributions
Y.L., T.ML.A, P.C.L., H.B. and O.G.. contributed to the conception of the work. Y.L. and T.ML.A. implemented the methods, conducted the data analysis. Y.L. and T.ML.A wrote the original draft. P.C.L., H.B. and O.G. reviewed and edited the draft. All authors interpreted the findings, made critical revision of the article and approved the final manuscript to be published.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The study was conducted in accordance with the Declaration of Helsinki, and data management was handled according to Swedish law and regulations. This study was approved by the Swedish Ethical Review Authority (2017/64131/1 with extensions 201901913, 202006544, 202102472; 2006/91431/3 with extensions 2008/146932, 2009/63432, 2010/192832). Informed consent from study subjects was not required for the current study.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1
Estimates of 5year RS and LLE for setting 2 and scenarios FH for the Gotland region.
Additional file 2
Estimates of 5year RS and LLE for setting 2 and scenario J for the Gotland region.
Additional file 3
Graphical comparisons of the bootstrapbased and conventional estimates of 5year RS, LLE and their CIs for all scenarios.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Leontyeva, Y., Bower, H., Gauffin, O. et al. Assessing the impact of including variation in general population mortality on standard errors of relative survival and loss in life expectancy. BMC Med Res Methodol 22, 130 (2022). https://doi.org/10.1186/s12874022015977
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874022015977
Keywords
 Relative survival
 Loss in life expectancy
 Flexible parametric survival models