Attributable risk from distributed lag models
© Gasparrini and Leone; licensee BioMed Central Ltd. 2014
Received: 28 January 2014
Accepted: 8 April 2014
Published: 23 April 2014
Skip to main content
© Gasparrini and Leone; licensee BioMed Central Ltd. 2014
Received: 28 January 2014
Accepted: 8 April 2014
Published: 23 April 2014
Measures of attributable risk are an integral part of epidemiological analyses, particularly when aimed at the planning and evaluation of public health interventions. However, the current definition of such measures does not consider any temporal relationships between exposure and risk. In this contribution, we propose extended definitions of attributable risk within the framework of distributed lag non-linear models, an approach recently proposed for modelling delayed associations in either linear or non-linear exposure-response associations.
We classify versions of attributable number and fraction expressed using either a forward or backward perspective. The former specifies the future burden due to a given exposure event, while the latter summarizes the current burden due to the set of exposure events experienced in the past. In addition, we illustrate how the components related to sub-ranges of the exposure can be separated.
We apply these methods for estimating the mortality risk attributable to outdoor temperature in two cities, London and Rome, using time series data for the periods 1993–2006 and 1992–2010, respectively. The analysis provides estimates of the overall mortality burden attributable to temperature, and then computes the components attributable to cold and heat and then mild and extreme temperatures.
These extended definitions of attributable risk account for the additional temporal dimension which characterizes exposure-response associations, providing more appropriate attributable measures in the presence of dependencies characterized by potentially complex temporal patterns.
Epidemiological studies usually rely on effect summaries based on ratio measures, such as relative risk, odds ratio or rate ratio, with the choice depending on the specific study design . Although these measures are ideal for summarizing the association of interest, they offer limited information on the actual impact of the exposure. This information is critical for the planning and evaluation of public health interventions, and it is better provided by relative excess measures such as the attributable fraction (AF), or absolute excess measures such as the attributable number (AN). Steenland and Armstrong offer a thorough overview on the topic . Here we generally refer to these summaries as attributable risk measures.
Problems in the definition of these measures may arise in the presence of delayed associations, occurring when an exposure generates a risk lasting well beyond the exposure period. Researchers in different fields, during the last thirty years, have proposed approaches for modelling this type of association [3–6]. In time series analysis, a popular approach is based on distributed lag models (DLMs) [7, 8], generalized to distributed lag non-linear models (DLNMs) when including non-linear exposure-response associations [9, 10]. Recently, the DLNM framework has been extended beyond time series data for modelling such dependencies, defined exposure-lag-response associations, in different study designs . However, attributable risk measures have not been developed for the DLM and DLNM class, with the result that their current definitions do not take into account the additional temporal structure in the exposure-response association. Previous work investigated the issue in time series analysis, although without producing a general approach [12, 13].
In this contribution, we extend this research and attempt to formalize the definition of attributable risk measures within the DLNM modelling framework. In particular, we illustrate how complex non-linear and temporal patterns can be accounted for in the computation of the attributable risk. Also, we show how attributable components related to different exposure ranges can be separated. We propose an example on the estimation of the health burden attributable to temperature with time series data, a problem which has received quite a lot of interest recently due to climate change predictions. However, the approach can be easily applied to other exposure-lag-response associations. The method is implemented in simple functions developed within the R software package and provided as additional files.
with n as the total number of cases. The parameter β x used in Eq. (1a) represents the risk associated with the exposure, and it usually corresponds to the logarithm of a ratio measure such as relative risk, relative rate or odds ratio. It is generally obtained from regression models while adjusting for potential confounders. The general definition of β x used here refers to the association with a specific exposure intensity x compared to a reference value x 0. For linear exposure-response relationships, the association can also be reported as β·x, where in this case β refer to a unit increase in x. For binary variables reporting presence/absence of the exposure, Eq. (1a) simplifies to AF=(RR−1)/RR, with RR as relative risk, as reported by Steenland and Armstrong . We keep the more general definition of β x , which is easily applicable to non-linear exposure-response relationships, throughout the manuscript.
The theoretical nature of these effect measures is based on a counterfactual, where the observed condition is compared with a reference state which never occurred. This state postulates that the same population is followed in an identical situation where only the exposure level changes to the reference value x 0. Typically, such a reference is represented by the absence of association, meaning x 0=0 and . However, different counterfactual conditions can be used, for example a lower exposure which can be determined by an intervention. In this case the quantity β x can be simply re-parameterized as , and Eq. (1) still applies.
with obtained by substituting Eq. (2) in Eq. (1b) . For the specific form of Eq. (2), it should be noted that , i.e. the sum of the attributable risk measured for individual exposures is usually higher than their concurrent attributable risk.
The function s(x,t) is computed as the approximate integral of the exposure-lag-response function over the lag dimension, representing the cumulated risk over the lag period. The parameterization in the final step of Eq. (3) is obtained through a cross-basis, involving a tensor product between the basis chosen for f(x) and w(ℓ), generating the transformed variables w x,t linearly combined with the parameters η. Simpler DLMs are defined by Eq. (3) by assuming f(x) as linear. Algebraic details and additional information are provided elsewhere . The cross-basis is specified with a reference value x 0 used later as a centering point for the function f(x), which is used to define the counterfactual condition.
This overall cumulative association is composed of the sum of contributions β x,ℓ from exposures experienced within the lag period. Algebraic definitions have been previously provided .
with n t as the number of cases at time t. This structure is consistent with the configuration of the regression model usually applied to fit the data, where the risk at time t is associated with lagged exposures at times t−ℓ. The definition of backward attributable risk requires an extended version of the counterfactual condition accounting for the additional lag dimension: b-AN x,t and b-AF x,t are interpreted as the number of cases and the related fraction at time t attributable to past exposures to x in the period t−ℓ 0,…,t−L, compared to a constant exposure x 0 throughout the same period.
This alternative version has some advantages if compared to the backward definition. First, the counterfactual condition is simpler: f-AF x,t and f-AN x,t are interpreted as the fraction and number of future cases in the period t+ℓ 0,…,t+L attributable to the single exposure x occurring at time t, compared to x 0. Moreover, the overall cumulative risk for a given exposure x t in (6a) is available also when the bi-dimensional exposure-lag-response is reduced to uni-dimensional exposure-response relationship, a step often needed in multi-site studies . In contrast, all the lag-specific contributions are needed to compute in (5a) for the backward counterpart.
However, the forward version also has an important limitation, related to the fact that the contributions are associated to risks measured at different times. The attributable number f-AN x,t in (6b) is computed by averaging the total counts experienced in the next ℓ 0,…,L times, thus only approximating the lag structure of risks. This approximation is likely to produce some bias, which is expected as an underestimation of the attributable number if compared to the backward version.
simply selecting the risk contributions from past exposures included in the range r. The related attributable number is computed by substituting Eq. (7) into Eq. (5b). Attributable components referring to different ranges can be summed up, as all are defined using the same counterfactual condition of a constant exposure x ℓ =x 0 for the whole lag period ℓ=ℓ 0,…,L.
The forward version has the additional advantage that for two non-overlapping ranges r 1 and r 2 the sum of the components is equal to the overall attributable risk, namely . In contrast, adopting a backward perspective , as the risks are simultaneously computed for the same time t in the like of Eq. (2).
The equations above can be applied either to forward or backward attributable risk and to separate components, simply substituting the related attributable numbers in Eq. (8a).
Analytical formulae for confidence intervals of attributable risk measures are not easily produced , and this also applies to the extended versions developed here. Although approximated estimators have been proposed [15, 16], in this context the most straightforward approach is to rely on interval estimation obtained empirically through Monte Carlo simulations [17, 18]. Basically, we take random samples η (j) of the original parameters η of the cross-basis in Eq. (3) from the assumed multivariate normal distribution with point estimate and (co)variance matrix derived from the regression model. These samples η (j) are used to compute for ℓ=ℓ 0,…,L and each intensity x, empirically reconstructing the distributions of the attributable measures defined in Eq. (5)–(8). The related 2.5th and 97.5th percentiles of such distributions are interpreted as 95% empirical confidence intervals (eCI).
The methods illustrated in the previous section are applied to estimate the all-cause mortality risk attributable to temperature, using daily time series from two cities, London and Rome, in the periods 1993-2006 and 1992-2010 respectively. R scripts and data implementing the method and partly replicating the results are provided as Additional files 1, 2, 3, 4, 5 and 6.
We fitted a standard time series Poisson model allowing for overdispersion, controlling for seasonal and long term trends and day of the week, using a 10 df/year spline and indicator variables, respectively. Model selection is still an issue of current research within the DLNM framework, although simulation studies indicate a good performance of methods based on the Akaike Information criterion (AIC) . Considering the illustrative purpose of the example, we selected a-priori the cross-basis function in Eq. (3) for representing the association between mean daily temperature and mortality, basing our choice on previous analyses. Specifically, the cross-basis is composed of a quadratic B-spline with two equally-spaced knots as the exposure-response function f(x), and a natural cubic B-spline with three equally-spaced knots in the log-scale as the lag-response function w(ℓ) over lags 0–25.
In the specific case of temperature where a null exposure condition cannot be defined, a reasonable choice is to center the cross-basis in Eq. (3) to the temperature of minimum risk, as suggested in previous publications . This optimal temperature corresponds to 20°C and 21°C for London and Rome respectively, and it represents the reference point x 0 for the computation of the attributable risk measures. These are obtained for the whole temperature range, and then for cold and heat contributions by separating the associations with temperatures lower or higher than x 0. In addition, the attributable components are separated further in mild and extreme cold and heat by selecting as cut-off values the 1st and 99th percentiles of city-specific distributions, corresponding to 0.4°C and 23.7°C in London and 2.6°C and 28.6°C in Rome.
We derived empirical confidence intervals for backward total attributable numbers and fractions, computed overall and for separated components, by simulating 5,000 samples from the assumed distribution of .
Total mortality fraction (%) attributable to temperature, computed backward ( b-AF tot ) and forward ( f-AF tot ), reported as overall, hot and cold components with 95% empirical confidence intervals (eCI)
b-AF t o t
f-AF t o t
b-AF t o t
f-AF t o t
The total backward attributable risk is then separated into components due to cold and hot temperatures, defined as those below and above the optimal temperature, respectively. The estimates, computed using Eq. (7), are reported in Table 1. The comparison of the two contributions clearly indicates that cold is responsible for most of the mortality attributable to temperature, with b-AF t o t equal to 12.95% and 10.84%, compared to 0.66% and 1.74% for heat, in the two cities. Estimates of forward attributable risk are very similar, and as expected their sum is equal to the overall burden, differently than for the backward version.
Total mortality fraction (%) attributable to temperature, computed backward ( b-AF tot ) and forward ( f-AF tot ), reported as components from mild and extreme hot and cold contributions with 95% empirical confidence intervals (eCI)
b-AF t o t
f-AF t o t
b-AF t o t
f-AF t o t
In contrast, the comparison between the two cities is rather different for the components attributable to mild and extreme hot temperatures. In spite of the stronger risk in London, the attributable fraction is similar for extreme heat and even higher in Rome for mild heat (1.32%–1.45% versus 0.25%–0.33%). This apparent contradiction is explained by the different temperature distribution, and in particular the percentile corresponding to the optimal temperature, corresponding to 93.6th and 72.5th in London and Rome. This result suggests the hypothesis that the population in Rome is more adapted to the range of temperatures corresponding to extreme hot if compared to London, where the population experienced only a few days of unusually high temperatures.
This phenomenon has interesting implications. An example is offered by the right panel of Figure 3, illustrating the estimated daily deaths b-AN x,t and f-AN x,t attributable to heat, computed backward and forward for the first summer in the time series for Rome, with related temperature trend. As expected, a substantial number of deaths are attributable to temperatures above the optimal value, represented by the horizontal dotted line, in the period mid-July to mid-August. The trend of forward attributable deaths f-AN x,t closely follows the daily temperatures, consistently with the definition of number of deaths attributable to the temperature in day t cumulated in the next L days. In contrast, the backward attributable number b-AN x,t decreases to zero and even becomes negative in late summer days, although the overall cumulative exposure-response in Figure 2 (bottom-right panel) does not show a RR below 1 for any temperature.
This paradox is explained by the counterfactual condition associated with the backward perspective. Specifically, each b-AN x,t compares the association with the observed temperatures in the past L days to a constant exposure x 0. In the presence of harvesting, the observed population becomes ‘healthier’ than the counterfactual population after a series of heat days, due to the depletion of the susceptible pool. This explains the negative attributable numbers for specific combinations of lagged exposures. This fact emphasises that harvesting should not be interpreted as a true protective association at longer lags, but rather as an artefact due to a change in the underlying population following a stress, which affects the counterfactual condition. This issue is relevant when using backward attributable risk measures b-AN x,t and b-AF x,t to assess the contribution of specific days. However, similarly to the net overall cumulative risk, the total attributable number b-AN t o t and fraction b-AN t o t , produced by Eq. (8) and reported in Tables 1 and 2, account for the discount by summing the contributions over the whole series.
In this contribution we illustrate an extended definition of attributable risk measures based on the DLNM framework. Consistently with this class of models, such a definition accounts for the complex pattern of potentially non-linear and delayed associations described through exposure-lag-response associations.
Two alternative definitions of attributable risk are proposed, assuming backward or forward perspectives. The former provides more consistent estimators which naturally arise from the structure of the regression model, where distributed lag terms at times t−ℓ contributes to the risk at time t. The forward attributable measures, in contrast, are affected from a negative bias related to the averaging of future counts, which nonetheless is likely to be relatively low. On the other hand, the forward version is well suited for separating the risk in components attributable to different ranges, as their sum matches the overall risk. Furthermore the forward perspective, looking from current exposure to future risk, seems more appropriate for quantifying the health burden due to specific exposure occurrences, as it is based on a more coherent counterfactual condition. Corrections have been proposed in previous works on risk attributable to multiple exposures [20–22], and can be applied to the backward version.
Strictly speaking, the definition given in Eq. 1a is interpreted as the attributable fraction among the sub-population of exposed subjects. In the setting of time series analysis for environmental stressors, the whole population is usually considered as exposed, and this definition can be more generally interpreted as the population attributable fraction. If only a subset is instead exposed, Eq. (5)–(8) can be easily extended using the equations proposed by Steenland and Armstrong  for population attributable risk.
Previous papers suggested approaches for producing attributable risk from distributed lag models when applied to heat-mortality associations. Baccini and colleagues applied DLMs and computed attributable risk measures, specifically addressing the issue of harvesting . Honda and colleagues illustrated an analysis on the mortality burden due to heat using DLNMs . However both approaches are limited, as the former assumes a linear threshold form of the exposure-response, while the latter averages the non-linear risk across the whole temperature range. In this paper we offer a formal and more consistent definition of such attributable risk measures.
An advantage of the proposed method is the provision of estimates for separate components of the attributable risk, associated with different exposure ranges. In the specific case of temperature-health associations, this allows the separation of attributable risks from cold and heat, and further from mild and extreme temperatures. The estimates reported in the example highlights how the simple analysis of exposure-response curves can be misleading in the attribution of risk, and that most of the mortality in the two cities is in fact attributable to mild cold temperatures, in spite of the relatively low RR.
The availability of attributable risk measures, complementary to estimates of exposure-response associations, is essential for the identification and planning of public health interventions. Their extension to exposure-lag-response associations allows the computation of such measures from dependencies showing potentially complex non-linear and temporal patterns.
Attributable number of cases
Distributed lag models
Distributed lag non-linear models
empirical confidence interval.
AG is supported through a Methodology Research fellowship awarded by Medical Research Council-UK (grant ID G1002296).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.