Comparison of different approaches to estimating age standardized net survival

Background Age-standardized net survival provides an important population-based summary of cancer survival that appropriately accounts for differences in other-cause mortality rates and standardizes the population age distribution to allow fair comparisons. Recently, there has been debate over the most appropriate method for estimating this quantity, with the traditional Ederer II approach being shown to have potential bias. Methods We compare lifetable-based estimates (Ederer II), a new unbiased method based on inverse probability of censoring weights (Pohar Perme) and model-based estimates. We make the comparison in a simulation setting; generating scenarios where we would expect to see a large theoretical bias. Results Our simulations demonstrate that even in relatively extreme scenarios there is negligible bias in age-standardized net survival when using the age-standardized Ederer II method, modelling with continuous age or using the Pohar Perme method. However, both the Ederer II and modelling approaches have some advantages over the Pohar Perme method in terms of greater precision, particularly for longer-term follow-up (10 and 15 years). Conclusions Our results show that, when age-standardizing, concern over bias with the traditional methods is unfounded. We have also shown advantages in using the more traditional and modelling methods. Electronic supplementary material The online version of this article (doi:10.1186/s12874-015-0057-3) contains supplementary material, which is available to authorized users.

other causes [7]. This allows a 'fair' comparison of cancer survival. Net survival is something we never observe as patients in the real world are at risk of death from other causes. Therefore, estimation requires assumptions to be made. The usual approach has been to use relative survival to estimate net survival.
Age plays an important role in the comparison of cancer survival. For most cancers net survival decreases with age. When comparing survival between populations it is therefore essential to 'adjust' for differences in the age distribution. When a single summary measure is required, traditional age standardization is used, where an estimate of relative survival is obtained separately in age groups and a weighted average calculated with the weights reflecting an international standard population [8].
Often survival differences between groups vary by age. For example, international comparisons for breast [9,10], colorectal [11], prostate [12] and lung cancer [13] showed larger differences in relative survival with increasing age. By just giving an overall average these important differences can be missed. However, age-standardized relative survival is still an important summary measure and will continue to be used.
Net survival has traditionally been estimated using relative survival; the ratio of all-cause survival to expected survival. Pohar Perme et al. showed the commonly used Ederer II method and other methods are potentially biased when interest lies in an overall average of net survival and suggest an alternative (the Pohar Perme method) [14]. There has been inconsistency in the definition of 'relative survival' which has been used to refer to an analysis where all ages have been pooled together [14,15]. Similarly, the term 'net survival' has been used to describe the average net survival in a population. However, the concepts of net and relative survival exist at the individual level. In fact we argue that the Pohar-Perme method is estimated in a relative survival framework and can only be interpreted as average net survival under two key assumptions that apply to all methods.
An alternative way to estimate net survival is to use statistical modelling [16,17]. Statistical modelling allows various assumptions to be incorporated, for example proportional excess hazards, only selected interactions or the fact that the true net survival can never increase. Figure 1 shows estimated net survival for cancer of rectum in England for those age 75+ estimated using three different methods described later (Ederer II, Pohar Perme and model based). It illustrates three key issues, (i) the methods give different estimates of net survival, particularly for long term follow up; (ii) the Pohar-Pohar estimate is more variable than the other methods; (iii) the confidence intervals for the Pohar-Perme estimate are wider.
We argue that the theoretical bias when using the Ederer II method to estimate age group specific or

Excess mortality, relative survival and net survival
We explain key concepts through both the excess mortality rate and relative survival as this highlights differences between the methods. The excess mortality rate for an individual i is the difference between their all-cause mortality rate, h i (t) and their expected mortality rate if they did not have cancer, h * i (t), The subscripts, i, are important as both expected and all-cause mortality varies between individuals, with age a key factor. Expected mortality rates are usually obtained from national or regional mortality statistics, stratified by age, sex, calendar year and potentially other covariates.
The relative survival function, where S i (t) and S * i (t) are the all-cause survival and expected survival functions. To interpret, R i (t) as a net probability of survival, S N i (t), i.e. in the hypothetical situation where it is not possible to die of other causes, two assumptions are needed; (i) Conditional independence between cancer and non-cancer mortality. (ii) Appropriate expected mortality information. This means that the mortality rate due to other causes for the cancer patients is the same as that in the population lifetable.
All methods described below require these assumptions to be true. For example, conditional independence is addressed by stratification, regression modelling or weighting for relevant covariates. For the remainder of this paper we take these assumptions as reasonable as our main focus is on the different methods of estimation.
In summary, net survival can be estimated in a relative survival framework under certain assumptions. However, Eq. (1) is at the individual level and often it is desired to present an average (marginal survival). The traditional lifetable approaches have averaged the numerator and denominator separately, which is not the same as taking an average of individual relative survival and is a key reason why traditional approaches have potential bias when estimating marginal survival.

Age-standardized relative survival
It is useful to obtain a summary measure of survival in a population. This can be done by calculating the average (or marginal) survival. For simplicity, we consider the situation where relative survival, R i (t), only varies by age. An internally age-standardized estimate is the average of the n different relative survival curves, where n is the number of study subjects. This estimates the internally age-standardized net survival, S N (t) under assumptions (i) and (ii). This is not the same as estimating relative survival ignoring the effect of age.
Equation (2) gives the age-standardized relative survival in a particular study population. However, since different populations may have different age distributions, traditional external age-standardization is usually performed. This forces the same age distribution on each population through a weighted average of age-specific estimates of relative survival. The usual external reference population uses weights in five age groups as shown in Table 1 [8]. The externally weighted estimate of age-standardized relative survival adds weights to Eq. (2), where w s i is weight for subject i from the reference population (Table 1) and a i gives the proportion in the age group to which the i th subject belongs. The ratio, w s i /a i will be higher than one in age groups under-represented in the study population compared with the standard population and less than one for age groups over-represented. Since there are only five different weights, in practice where w s k is the weight in the k th age group and R k (t) the corresponding relative survival.
If the weights, w s k are replaced by the observed proportion of subjects in each age group then Eq. (4) gives an internally age-standardized estimate, i.e. a grouped version of Eq. (2).

The Ederer II method
The Ederer II method [18] is estimated in a relative survival framework, i.e. the ratio of all-cause survival, S(t), to expected survival, S * (t). The relative survival, R(t) is, We use an adaption of the Ederer II method and estimate the all-cause and expected survival through back calculation from the all-cause and expected mortality rates. This makes little difference in practice, but allows comparison with the Pohar-Perme method and is similar to the extension of the Ederer II method to continuous time described by Pohar Perme et al. [14]. Follow-up time is divided into a number of intervals and the excess mortality rate, λ j calculated in each interval by obtaining the total number of deaths in the j th interval and subtracting the expected number of deaths. Dividing by the total number of person years converts the difference to a rate. For the j th interval the excess mortality rate is, where d ij is the death indicator for subject i in interval j, d * ij is the expected number of deaths and y ij is the time at risk. d * ij is calculated using p * ij , the probability individual i in interval j will survive a year given their attained age and calendar year and other demographic factors, where k j is the length of the j th interval. The relative survival is then obtained by, When calculated for the study population as a whole the Ederer II method gives a biased estimate of internally age-standardized net survival unless there is no variation in expected or relative survival between different ages [14,19], a situation almost certainly not true. Hakulinen et al. argued that unless there was substantial variation in relative survival by age the bias would be small, but a theoretical bias still exists [19].
When comparing populations, external age standardization is used with relative survival calculated separately in each age group. Any bias due to variation in relative and expected survival will be reduced. Within each age group the Ederer II method will give an unbiased estimate of the age-specific net survival if there is no variation in relative survival within each age group. This assumption is almost certainly not true, but since narrow age groups are being used the variation will be much reduced compared to the all age estimate. The important question is whether this assumption matters, as empirical observation has shown that the Ederer II method gives a more precise estimate than the Pohar Perme method for long-term estimates [20,21].

The Pohar Perme method
The Pohar Perme method was developed to give an unbiased estimate of internally age-standardized net survival [14]. The method estimates the internally agestandardized relative survival, i.e. that defined in Eq. (2). It can be interpreted as internally age standardized net survival under assumptions (i) and (ii). The method gives greater weight to individuals with a higher risk of other cause mortality. For example, consider two individuals aged 60 and 80 at diagnosis. If both are still alive 10 years after diagnosis, the 80 year old receives more weight because similar individuals are more likely to have died of other causes. We use an adaption of the method for when survival time is recorded in monthly intervals [21]. The weight for subject i in interval j is the inverse of the expected survival, S * ij , so w PP ij = 1/S * ij . The excess mortality rate in the j th interval is, This is converted to the survival scale in the same way as for Ederer II, For comparisons between population groups external agestandardization is required, so the Pohar-Perme estimate is calculated separately in each age group and a weighted average estimated using Eq. (4).

Brenner method
It is not possible to calculate traditionally agestandardized relative survival unless all age-specific estimates can be calculated, a problem more likely in the oldest age group. The Brenner alternative method calculates an 'all age' Ederer II estimate, with each subject up-or down-weighted so the age distribution reflects the reference population [22]. The weights, w B i are the same as those used in Eq. (3).
The excess mortality rate for the j th interval incorporating these weights is, The weights depend on the age distribution at diagnosis. The weighting does not overcome any potential bias if there is substantial variation in relative survival by age. The excess mortality rate is transformed to the relative survival in the same way as above.

Modelling
There is growing use of statistical models for excess mortality [16,23]. Modelling has some advantages over nonparametric estimates by allowing more detailed exploration and quantification of differences between groups and can include continuous covariates, such as age at diagnosis [10]. Modelling also allows more precise estimation of parameters of interest by making assumptions. It is also possible to obtain an average or weighted average of individual predicted relative survival functions to obtain internally or externally age-standardized estimates.
Models for excess mortality are of the following form, Interest lies in how covariates affect the excess mortality rate, λ i (t). When age-standardizing relative survival only the effect of age needs to be modelled. An internally agestandardized estimate of relative survival is obtained using Eq.
(2) where relative survival is estimated for each subject in the study population. An externally age-standardized estimate is calculated using Eq. (3). We use a flexible parametric survival model for the excess mortality [23]. Through the use of restricted cubic splines the model can capture just about any shape for the baseline excess hazard [24]. It can also incorporate continuous covariates, so the effect of age is captured in more detail, and allow for non-proportional excess hazards. The latter point is important as non-proportional excess hazards is extremely common in these studies and allows the change in the excess mortality rate as a function of age to change as a function of follow-up time.

Simulation study
The purpose of the simulation study is to quantify any bias in the methods in situations when one would expect the traditional non-parametric lifetable based methods to be biased, i.e. when there is variation in relative survival by age. We have deliberately chosen two scenarios where there is substantial variation by age and, in practice, most cancer sites have far less variation. In addition to quantifying bias we assess the variability in the different estimates.
Times to death from cancer (net survival) and due to other causes were generated for each individual with the minimum value taken as the time to death. The simulation strategy is outlined in Fig. 2. Table 2 gives the true net survival at selected ages for each scenario showing large variation by age. The true internally and externally age-standardized estimates are also shown. In the simulation relative survival is equivalent to net survival at the individual level since (i) only age affects relative survival and (ii) the population mortality information used to analyse the data is the same as that used to generate the data.
For each scenario 1000 data sets were generated and estimates of internally and externally age-standardized relative survival obtained using the various methods. These are explained in more detail below.
For internal age-standardization the following methods were used, Ederer II (all age) combines all ages into a single group. Time since diagnosis was split into monthly intervals (this is often the detail cancer registries are prepared to release their data).
Ederer II (standardized) obtains age group specific estimates and uses internal age-standardization using Eq. (4), with the observed proportions in each age group as weights. Time since diagnosis was split into monthly intervals. Pohar Perme is calculated for the study population as a whole. Time since diagnosis was split into monthly intervals. Model-based (grouped) uses a flexible parametric model with grouped age and applying Eq. (2). The baseline hazard was modelled using 5 df for the spline variables. The effect of age was assumed to be non-proportional by incorporating interactions with follow-up time (3 df for each age group). Model-based (continuous) uses a flexible parametric model with continuous age. Restricted cubic splines were used for modelling age (3 df ) and applying Eq. (2). The baseline hazard was modelled using 5 df. Nonproportional excess hazards were incorporated through time-dependent effects (3 df for each spline term). Given the way in which the data were simulated, this can be  considered over modelling, but it reflects the way models would be applied in practice.
For external age-standardization the following methods were used, Ederer II (Brenner) combines all ages and uses the Brenner alternative method to give an externally weighted estimate.
Ederer II (standardized) obtains age group specific estimates and uses external age-standardization using Eq. (4) using the external reference weights. Pohar Perme obtains age group specific estimates and uses external age-standardization using Eq. (4) using the external reference weights. Model-based (grouped) uses a flexible parametric model using grouped age and applying Eq. 3. This was the same model as used for internal age-standardization, but applies different weights for the predictions. Model-based (continuous) uses a flexible parametric model with continuous age and applying Eq. 3. This was the same model as used for internal age-standardization for the predictions.
Each method was compared to the the true value to give the bias (expressed as difference in percentage points), the mean square error (MSE) and the coverage [25]. The Stata code for the simulations can be found in the online Additional files 1 and 2. Table 3 shows the bias, MSE and coverage for scenario 1. The bias is low for all methods for both internal and external age standardization. At five years the largest bias was  Fig. 3 (internal age standardization) and Fig. 4 (external age standardization). The smallest variation is for the all-age Ederer II method. Table 4 shows the bias, MSE and coverage for scenario 2 with Figs. 5 and 6 showing scatter plots of the estimates from each simulation for internal and external standardization respectively. One would expect more bias for the Ederer II method in this scenario since variation by age in excess mortality continues for the whole study followup. This can be seen by the fact that the Ederer II all age estimate (internal) and the Brenner method (external) gives biased estimates. For example, a 1.91 and 2.80 percentage point difference for external age standardization for 10 and 15 years respectively. However, the bias for Ederer II using traditional age standardization is less than 0.3 percentage points. There is some bias for the modelbased grouped age estimate at a 0.94 and 1.10 percentage  point difference for 10 and 15 years respectively. Coverage is reasonable for the Pohar Perme, Ederer II traditional age standardized and model-based continuous age. As for scenario 1 the Pohar Perme method has a higher MSE reflecting more variation in the estimate.

Age group-specific estimates
To estimate traditionally age standardized net survival separate estimates are required within each age group. For the four youngest age groups there is negligible bias and broad agreement for all methods (data not shown). However, the oldest age group will generally show more bias due to more variation in expected and relative survival in this group. Furthermore, a greater proportion of patients will die from other causes leading to low numbers as follow-up time increases. Table 5 shows the bias, MSE and coverage for the oldest age group for the different methods. Bias for Ederer II for scenario 2 is 0.51, 0.96 and 1.25 percentage points at 5, 10 and 15 years respectively. As with the age standardized estimates, there is far greater variation for the Pohar Perme method, particularly at 10 and 15 years reflected by the higher MSEs.

Discussion
We have shown through a simulation that when an estimate of age-standardized net survival is required, Ederer We do not dispute the theoretical bias in the Ederer II estimate shown by Pohar-Perme [14]. However, when using traditional age-standardization age-group specific estimates are obtained and within each age-group there is far less variation in both expected and relative survival. Thus, even in the extreme scenarios presented here, the potential bias can be ignored. For longer term survival, due to the impact of the weights of elderly subjects, the Pohar Perme estimate has higher variability.
The traditionally age-standardized Ederer II estimate has improved precision over the Pohar Perme estimate since it makes assumptions about variability. The assumptions are there is no variation in expected survival or in relative survival within an age group. The first assumption is definitely not true and the second is unlikely to be. However, these assumptions are not very important in terms of bias. Thus the 'cost' of the bias when using Ederer II in traditional age standardization can effectively be ignored, but there is a 'benefit' in precision for longer term survival. The difference between the methods is small at five years, but can be seen at 10 and 15 years. For Scenario 1, even the all-age Ederer II had negligible bias due to the variation in age not persisting over follow-up time showing that these assumptions are not that important [19]. The model-based grouped age approach showed some bias, particularly for the more extreme scenario 2. The grouping of age in a statistical model is more prone to bias than when using the Ederer II method due to the different ways of averaging within an age group.
The model-based continuous age approach was unbiased and had improved precision due to making certain assumptions. Firstly, the excess mortality rate is considered a smooth function rather than a step function. The true net survival function can never increase and in practice the estimated relative survival curve when modelling will be a decreasing function of time and not increase due to chance like the Pohar-Perme and Ederer II methods. Secondly, the effect of age is treated as continuous and estimates have greater precision than when grouping age [26]. We used restricted cubic splines to approximate the underlying excess mortality rate as these have been shown to be unbiased in a wide range of scenarios [24]. If interest is only in a single summary measure then we see little advantage of modelling over the non-parametric estimates. However, modelling allows greater understanding of differences between groups of interest. From a single statistical model it is possible to produce summary measures such as age standardized relative survival, but also quantify how differences vary by time since diagnosis, by age or by any other modelled covariate.
Age standardized relative survival is a weighted average of five age groups. The four youngest groups has negligible bias and it is the oldest age group that more likely to be biased, but this group only contributes 29 % of the weight. When interest lies in age-specific estimates the bias may be more important. For example, in our more extreme scenario, the bias for Ederer II for those aged 75+ was around 1 percentage point at 10 and 15 years, though the MSE was substantially lower than the Pohar Perme estimate. This bias could be reduced by using more finely split age groups, but then there would be the potential problem of having no patients at risk for longer term follow-up, so traditional age standardization could not be performed. When modelling age continuously this problem does not occur. An important issue here is whether one should estimate long-term net survival for the elderly. To estimate 15 year net survival in those aged 90, for example, is probably not relevant as it is highly likely that all would be dead by the age of 105. However, if one wants a long-term average estimate for the full population then an estimate in the oldest age group is required. All methods have problems with estimation in the oldest age group. For example, if at diagnosis there are a number of subjects aged 90 or over, but by 4 years these people are all dead. The Pohar Perme estimate has no one over the age of 90 to up weight and thus estimates are based on the survival experience of a younger group. There is a similar problem for the Ederer II and so the younger subjects remaining in the age groups are used for longer term excess mortality estimates. Modelling "borrows" information across both age and follow-up time in the predictions of net survival for older subjects.
Other authors have made strong statements that the Pohar Perme estimate is preferable over Ederer II. However, some of these comparisons have been with the all age Ederer II estimate rather than the traditionally age standardized estimate [14,15] or placed more emphasis on bias than precision [15]. One paper calculates the difference between the Pohar Perme estimate and various estimates and labels this 'bias' [27]. The Pohar Perme estimate is subject to more random variation than the other methods and this is an inappropriate use of the term 'bias' . A simulation study provides the appropriate framework to quantify bias. One further issue is potential bias when the age distribution changes over calendar time [28,29]. This can lead to bias, but in most cases this will negligible. Firstly, it is unlikely the age distribution will change as much as in the simulation studies (e.g. a sudden 10 year increase/decrease in the mean age at diagnosis [15]). Secondly, it is not usual to pool so many calendar years together since we would either stratify or model the effect calendar year. Simulation studies only investigate a limited number of scenarios. Our two scenarios were chosen as very extreme cases due to the variation in net survival by age. The majority of cancer sites will have far less variation. We believe scenario 2 is more extreme than any site seen in practice as the variation in the excess mortality rate by age continues into the long term. Various studies have shown that there is far more variation in the excess mortality rate in the first year or so [9][10][11][12][13]19]. As the bias is negligible in this extreme case, any potential bias can be ignored in real world applications. Our simulations use a sample size of 15,000. Bias will not be affected by sample size. However, the variation in the Pohar Perme estimate seen in Fig. 1 will become more pronounced in smaller samples.

Conclusion
In summary, if interest is in a single summary measure of survival we do not see a considerable advantage of the Pohar Perme estimate over the age-standardized Ederer II estimate and a benefit of the latter for longer term followup due to improved precision. Our personal preference is to use statistical modelling as it is possible to obtain both simple summaries, such as age standardized estimates, and a more detailed understanding of differences between demographic groups.