Attributable risk from distributed lag models
 Antonio Gasparrini^{1}Email author and
 Michela Leone^{2}
DOI: 10.1186/147122881455
© Gasparrini and Leone; licensee BioMed Central Ltd. 2014
Received: 28 January 2014
Accepted: 8 April 2014
Published: 23 April 2014
Abstract
Background
Measures of attributable risk are an integral part of epidemiological analyses, particularly when aimed at the planning and evaluation of public health interventions. However, the current definition of such measures does not consider any temporal relationships between exposure and risk. In this contribution, we propose extended definitions of attributable risk within the framework of distributed lag nonlinear models, an approach recently proposed for modelling delayed associations in either linear or nonlinear exposureresponse associations.
Methods
We classify versions of attributable number and fraction expressed using either a forward or backward perspective. The former specifies the future burden due to a given exposure event, while the latter summarizes the current burden due to the set of exposure events experienced in the past. In addition, we illustrate how the components related to subranges of the exposure can be separated.
Results
We apply these methods for estimating the mortality risk attributable to outdoor temperature in two cities, London and Rome, using time series data for the periods 1993–2006 and 1992–2010, respectively. The analysis provides estimates of the overall mortality burden attributable to temperature, and then computes the components attributable to cold and heat and then mild and extreme temperatures.
Conclusions
These extended definitions of attributable risk account for the additional temporal dimension which characterizes exposureresponse associations, providing more appropriate attributable measures in the presence of dependencies characterized by potentially complex temporal patterns.
Keywords
Attributable risk Attributable fraction Distributed lag modelsBackground
Epidemiological studies usually rely on effect summaries based on ratio measures, such as relative risk, odds ratio or rate ratio, with the choice depending on the specific study design [1]. Although these measures are ideal for summarizing the association of interest, they offer limited information on the actual impact of the exposure. This information is critical for the planning and evaluation of public health interventions, and it is better provided by relative excess measures such as the attributable fraction (AF), or absolute excess measures such as the attributable number (AN). Steenland and Armstrong offer a thorough overview on the topic [2]. Here we generally refer to these summaries as attributable risk measures.
Problems in the definition of these measures may arise in the presence of delayed associations, occurring when an exposure generates a risk lasting well beyond the exposure period. Researchers in different fields, during the last thirty years, have proposed approaches for modelling this type of association [3–6]. In time series analysis, a popular approach is based on distributed lag models (DLMs) [7, 8], generalized to distributed lag nonlinear models (DLNMs) when including nonlinear exposureresponse associations [9, 10]. Recently, the DLNM framework has been extended beyond time series data for modelling such dependencies, defined exposurelagresponse associations, in different study designs [11]. However, attributable risk measures have not been developed for the DLM and DLNM class, with the result that their current definitions do not take into account the additional temporal structure in the exposureresponse association. Previous work investigated the issue in time series analysis, although without producing a general approach [12, 13].
In this contribution, we extend this research and attempt to formalize the definition of attributable risk measures within the DLNM modelling framework. In particular, we illustrate how complex nonlinear and temporal patterns can be accounted for in the computation of the attributable risk. Also, we show how attributable components related to different exposure ranges can be separated. We propose an example on the estimation of the health burden attributable to temperature with time series data, a problem which has received quite a lot of interest recently due to climate change predictions. However, the approach can be easily applied to other exposurelagresponse associations. The method is implemented in simple functions developed within the R software package and provided as additional files.
Methods
Attributable risk measures
with n as the total number of cases. The parameter β_{ x } used in Eq. (1a) represents the risk associated with the exposure, and it usually corresponds to the logarithm of a ratio measure such as relative risk, relative rate or odds ratio. It is generally obtained from regression models while adjusting for potential confounders. The general definition of β_{ x } used here refers to the association with a specific exposure intensity x compared to a reference value x_{0}. For linear exposureresponse relationships, the association can also be reported as β·x, where in this case β refer to a unit increase in x. For binary variables reporting presence/absence of the exposure, Eq. (1a) simplifies to AF=(RR−1)/RR, with RR as relative risk, as reported by Steenland and Armstrong [2]. We keep the more general definition of β_{ x }, which is easily applicable to nonlinear exposureresponse relationships, throughout the manuscript.
The theoretical nature of these effect measures is based on a counterfactual, where the observed condition is compared with a reference state which never occurred. This state postulates that the same population is followed in an identical situation where only the exposure level changes to the reference value x_{0}. Typically, such a reference is represented by the absence of association, meaning x_{0}=0 and ${\beta}_{{x}_{0}}=0$. However, different counterfactual conditions can be used, for example a lower exposure which can be determined by an intervention. In this case the quantity β_{ x } can be simply reparameterized as ${\beta}_{x}^{\ast}={\beta}_{x}{\beta}_{{x}_{0}}$, and Eq. (1) still applies.
with ${\text{AN}}_{{x}_{1},\dots ,{x}_{p}}$ obtained by substituting Eq. (2) in Eq. (1b) [2]. For the specific form of Eq. (2), it should be noted that ${\text{AF}}_{{x}_{1},\dots ,{x}_{p}}\le {\text{AF}}_{x1}+\dots +{\text{AF}}_{\mathit{\text{xp}}}$, i.e. the sum of the attributable risk measured for individual exposures is usually higher than their concurrent attributable risk.
A review of the DLNM modelling framework
The function s(x,t) is computed as the approximate integral of the exposurelagresponse function over the lag dimension, representing the cumulated risk over the lag period. The parameterization in the final step of Eq. (3) is obtained through a crossbasis, involving a tensor product between the basis chosen for f(x) and w(ℓ), generating the transformed variables w_{x,t} linearly combined with the parameters η. Simpler DLMs are defined by Eq. (3) by assuming f(x) as linear. Algebraic details and additional information are provided elsewhere [11]. The crossbasis is specified with a reference value x_{0} used later as a centering point for the function f(x), which is used to define the counterfactual condition.
This overall cumulative association is composed of the sum of contributions β_{x,ℓ} from exposures ${x}_{t{\ell}_{0}},\dots ,{x}_{tL}$ experienced within the lag period. Algebraic definitions have been previously provided [11].
Forward and backward perspectives
Attributable risk from DLNMs
with n_{ t } as the number of cases at time t. This structure is consistent with the configuration of the regression model usually applied to fit the data, where the risk at time t is associated with lagged exposures at times t−ℓ. The definition of backward attributable risk requires an extended version of the counterfactual condition accounting for the additional lag dimension: bAN_{x,t} and bAF_{x,t} are interpreted as the number of cases and the related fraction at time t attributable to past exposures to x in the period t−ℓ_{0},…,t−L, compared to a constant exposure x_{0} throughout the same period.
This alternative version has some advantages if compared to the backward definition. First, the counterfactual condition is simpler: fAF_{x,t} and fAN_{x,t} are interpreted as the fraction and number of future cases in the period t+ℓ_{0},…,t+L attributable to the single exposure x occurring at time t, compared to x_{0}. Moreover, the overall cumulative risk $\sum {\beta}_{{x}_{t},\ell}$ for a given exposure x_{ t } in (6a) is available also when the bidimensional exposurelagresponse is reduced to unidimensional exposureresponse relationship, a step often needed in multisite studies [14]. In contrast, all the lagspecific contributions are needed to compute $\sum {\beta}_{{x}_{t\ell},\ell}$ in (5a) for the backward counterpart.
However, the forward version also has an important limitation, related to the fact that the contributions are associated to risks measured at different times. The attributable number fAN_{x,t} in (6b) is computed by averaging the total counts experienced in the next ℓ_{0},…,L times, thus only approximating the lag structure of risks. This approximation is likely to produce some bias, which is expected as an underestimation of the attributable number if compared to the backward version.
Separating attributable components
simply selecting the risk contributions from past exposures included in the range r. The related attributable number ${\mathrm{b}\mathit{\text{AN}}}_{x,t}^{r}$ is computed by substituting Eq. (7) into Eq. (5b). Attributable components referring to different ranges can be summed up, as all are defined using the same counterfactual condition of a constant exposure x_{ ℓ }=x_{0} for the whole lag period ℓ=ℓ_{0},…,L.
The forward version has the additional advantage that for two nonoverlapping ranges r_{1} and r_{2} the sum of the components is equal to the overall attributable risk, namely ${\mathrm{f}\mathit{\text{AF}}}_{x,t}^{{r}_{1}+{r}_{2}}={\mathrm{f}\mathit{\text{AF}}}_{x,t}^{{r}_{1}}+{\mathrm{f}\mathit{\text{AF}}}_{x,t}^{{r}_{2}}$. In contrast, adopting a backward perspective ${\mathrm{b}\mathit{\text{AF}}}_{x,t}^{{r}_{1}+{r}_{2}}\le {\mathrm{b}\mathit{\text{AF}}}_{x,t}^{{r}_{1}}+{\mathrm{b}\mathit{\text{AF}}}_{x,t}^{{r}_{2}}$, as the risks are simultaneously computed for the same time t in the like of Eq. (2).
Total attributable risk
The equations above can be applied either to forward or backward attributable risk and to separate components, simply substituting the related attributable numbers in Eq. (8a).
Computing uncertainty intervals
Analytical formulae for confidence intervals of attributable risk measures are not easily produced [15], and this also applies to the extended versions developed here. Although approximated estimators have been proposed [15, 16], in this context the most straightforward approach is to rely on interval estimation obtained empirically through Monte Carlo simulations [17, 18]. Basically, we take random samples η^{(j)} of the original parameters η of the crossbasis in Eq. (3) from the assumed multivariate normal distribution with point estimate $\widehat{\mathit{\eta}}$ and (co)variance matrix $V\left(\widehat{\mathit{\eta}}\right)$ derived from the regression model. These samples η^{(j)} are used to compute ${\beta}_{x,\ell}^{\left(j\right)}$ for ℓ=ℓ_{0},…,L and each intensity x, empirically reconstructing the distributions of the attributable measures defined in Eq. (5)–(8). The related 2.5^{th} and 97.5^{th} percentiles of such distributions are interpreted as 95% empirical confidence intervals (eCI).
Results
The methods illustrated in the previous section are applied to estimate the allcause mortality risk attributable to temperature, using daily time series from two cities, London and Rome, in the periods 19932006 and 19922010 respectively. R scripts and data implementing the method and partly replicating the results are provided as Additional files 1, 2, 3, 4, 5 and 6.
Modelling strategy
We fitted a standard time series Poisson model allowing for overdispersion, controlling for seasonal and long term trends and day of the week, using a 10 df/year spline and indicator variables, respectively. Model selection is still an issue of current research within the DLNM framework, although simulation studies indicate a good performance of methods based on the Akaike Information criterion (AIC) [11]. Considering the illustrative purpose of the example, we selected apriori the crossbasis function in Eq. (3) for representing the association between mean daily temperature and mortality, basing our choice on previous analyses. Specifically, the crossbasis is composed of a quadratic Bspline with two equallyspaced knots as the exposureresponse function f(x), and a natural cubic Bspline with three equallyspaced knots in the logscale as the lagresponse function w(ℓ) over lags 0–25.
In the specific case of temperature where a null exposure condition cannot be defined, a reasonable choice is to center the crossbasis in Eq. (3) to the temperature of minimum risk, as suggested in previous publications [13]. This optimal temperature corresponds to 20°C and 21°C for London and Rome respectively, and it represents the reference point x_{0} for the computation of the attributable risk measures. These are obtained for the whole temperature range, and then for cold and heat contributions by separating the associations with temperatures lower or higher than x_{0}. In addition, the attributable components are separated further in mild and extreme cold and heat by selecting as cutoff values the 1^{st} and 99^{th} percentiles of cityspecific distributions, corresponding to 0.4°C and 23.7°C in London and 2.6°C and 28.6°C in Rome.
We derived empirical confidence intervals for backward total attributable numbers and fractions, computed overall and for separated components, by simulating 5,000 samples from the assumed distribution of $\widehat{\mathit{\eta}}$.
Risk attributable to temperature
Total mortality fraction (%) attributable to temperature, computed backward ( bAF _{ tot } ) and forward ( fAF _{ tot } ), reported as overall, hot and cold components with 95% empirical confidence intervals (eCI)
Deaths  Overall  Cold  Hot  

London  845,215  bAF_{ t o t }  13.59 (10.04–17.09)  12.95 (9.32–16.38)  0.66 (0.52–0.80) 
fAF_{ t o t }  13.41 (9.72–16.87)  12.84 (9.38–16.33)  0.57 (0.45–0.68)  
Rome  395,691  bAF_{ t o t }  12.58 (9.30–15.64)  10.84 (7.37–14.23)  1.74 (1.12–2.37) 
fAF_{ t o t }  12.27 (8.94–15.41)  10.72 (7.19–14.00)  1.55 (0.95–2.13) 
Cold, heat and extreme components
The total backward attributable risk is then separated into components due to cold and hot temperatures, defined as those below and above the optimal temperature, respectively. The estimates, computed using Eq. (7), are reported in Table 1. The comparison of the two contributions clearly indicates that cold is responsible for most of the mortality attributable to temperature, with bAF_{ t o t } equal to 12.95% and 10.84%, compared to 0.66% and 1.74% for heat, in the two cities. Estimates of forward attributable risk are very similar, and as expected their sum is equal to the overall burden, differently than for the backward version.
Total mortality fraction (%) attributable to temperature, computed backward ( bAF _{ tot } ) and forward ( fAF _{ tot } ), reported as components from mild and extreme hot and cold contributions with 95% empirical confidence intervals (eCI)
Extreme cold  Mild cold  Mild hot  Extreme hot  

London  bAF_{ t o t }  0.55 (0.45–0.64)  12.48 (8.86–15.88)  0.31 (0.23–0.38)  0.36 (0.29–0.43) 
fAF_{ t o t }  0.47 (0.40–0.53)  12.38 (8.98–15.78)  0.29 (0.22–0.35)  0.28 (0.23–0.33)  
Rome  bAF_{ t o t }  0.59 (0.47–0.70)  10.37 (6.88–13.63)  1.45 (0.89–2.01)  0.33 (0.25–0.40) 
fAF_{ t o t }  0.47 (0.39–0.54)  10.27 (6.69–13.50)  1.32 (0.75–1.85)  0.25 (0.19–0.30) 
In contrast, the comparison between the two cities is rather different for the components attributable to mild and extreme hot temperatures. In spite of the stronger risk in London, the attributable fraction is similar for extreme heat and even higher in Rome for mild heat (1.32%–1.45% versus 0.25%–0.33%). This apparent contradiction is explained by the different temperature distribution, and in particular the percentile corresponding to the optimal temperature, corresponding to 93.6^{th} and 72.5^{th} in London and Rome. This result suggests the hypothesis that the population in Rome is more adapted to the range of temperatures corresponding to extreme hot if compared to London, where the population experienced only a few days of unusually high temperatures.
The harvesting paradox
This phenomenon has interesting implications. An example is offered by the right panel of Figure 3, illustrating the estimated daily deaths bAN_{x,t} and fAN_{x,t} attributable to heat, computed backward and forward for the first summer in the time series for Rome, with related temperature trend. As expected, a substantial number of deaths are attributable to temperatures above the optimal value, represented by the horizontal dotted line, in the period midJuly to midAugust. The trend of forward attributable deaths fAN_{x,t} closely follows the daily temperatures, consistently with the definition of number of deaths attributable to the temperature in day t cumulated in the next L days. In contrast, the backward attributable number bAN_{x,t} decreases to zero and even becomes negative in late summer days, although the overall cumulative exposureresponse in Figure 2 (bottomright panel) does not show a RR below 1 for any temperature.
This paradox is explained by the counterfactual condition associated with the backward perspective. Specifically, each bAN_{x,t} compares the association with the observed temperatures in the past L days to a constant exposure x_{0}. In the presence of harvesting, the observed population becomes ‘healthier’ than the counterfactual population after a series of heat days, due to the depletion of the susceptible pool. This explains the negative attributable numbers for specific combinations of lagged exposures. This fact emphasises that harvesting should not be interpreted as a true protective association at longer lags, but rather as an artefact due to a change in the underlying population following a stress, which affects the counterfactual condition. This issue is relevant when using backward attributable risk measures bAN_{x,t} and bAF_{x,t} to assess the contribution of specific days. However, similarly to the net overall cumulative risk, the total attributable number bAN_{ t o t } and fraction bAN_{ t o t }, produced by Eq. (8) and reported in Tables 1 and 2, account for the discount by summing the contributions over the whole series.
Discussion and conclusions
In this contribution we illustrate an extended definition of attributable risk measures based on the DLNM framework. Consistently with this class of models, such a definition accounts for the complex pattern of potentially nonlinear and delayed associations described through exposurelagresponse associations.
Two alternative definitions of attributable risk are proposed, assuming backward or forward perspectives. The former provides more consistent estimators which naturally arise from the structure of the regression model, where distributed lag terms at times t−ℓ contributes to the risk at time t. The forward attributable measures, in contrast, are affected from a negative bias related to the averaging of future counts, which nonetheless is likely to be relatively low. On the other hand, the forward version is well suited for separating the risk in components attributable to different ranges, as their sum matches the overall risk. Furthermore the forward perspective, looking from current exposure to future risk, seems more appropriate for quantifying the health burden due to specific exposure occurrences, as it is based on a more coherent counterfactual condition. Corrections have been proposed in previous works on risk attributable to multiple exposures [20–22], and can be applied to the backward version.
Strictly speaking, the definition given in Eq. 1a is interpreted as the attributable fraction among the subpopulation of exposed subjects. In the setting of time series analysis for environmental stressors, the whole population is usually considered as exposed, and this definition can be more generally interpreted as the population attributable fraction. If only a subset is instead exposed, Eq. (5)–(8) can be easily extended using the equations proposed by Steenland and Armstrong [2] for population attributable risk.
Previous papers suggested approaches for producing attributable risk from distributed lag models when applied to heatmortality associations. Baccini and colleagues applied DLMs and computed attributable risk measures, specifically addressing the issue of harvesting [12]. Honda and colleagues illustrated an analysis on the mortality burden due to heat using DLNMs [13]. However both approaches are limited, as the former assumes a linear threshold form of the exposureresponse, while the latter averages the nonlinear risk across the whole temperature range. In this paper we offer a formal and more consistent definition of such attributable risk measures.
An advantage of the proposed method is the provision of estimates for separate components of the attributable risk, associated with different exposure ranges. In the specific case of temperaturehealth associations, this allows the separation of attributable risks from cold and heat, and further from mild and extreme temperatures. The estimates reported in the example highlights how the simple analysis of exposureresponse curves can be misleading in the attribution of risk, and that most of the mortality in the two cities is in fact attributable to mild cold temperatures, in spite of the relatively low RR.
The availability of attributable risk measures, complementary to estimates of exposureresponse associations, is essential for the identification and planning of public health interventions. Their extension to exposurelagresponse associations allows the computation of such measures from dependencies showing potentially complex nonlinear and temporal patterns.
Abbreviations
 AF:

Attributable fraction
 AN:

Attributable number of cases
 RR:

Relative risk
 DLM:

Distributed lag models
 DLNM:

Distributed lag nonlinear models
 eCI:

empirical confidence interval.
Declarations
Acknowledgements
AG is supported through a Methodology Research fellowship awarded by Medical Research CouncilUK (grant ID G1002296).
Authors’ Affiliations
References
 Rothman KJ, Greenland S, Lash TL: Modern Epidemiology. 2008, Philadelphia: Lipcott Williams & Wilkins,
 Steenland K, Armstrong B: An overview of methods for calculating the burden of disease due to specific risk factors. Epidemiology. 2006, 17 (5): 512519. 10.1097/01.ede.0000229155.05644.43.View ArticlePubMed
 Thomas DC: Statistical methods for analyzing effects of temporal patterns of exposure on cancer risks. Scand J Work Environ Health. 1983, 9 (4): 353366. 10.5271/sjweh.2401.View ArticlePubMed
 Breslow NL, Day NE: Statistical Methods in Cancer Research. Vol. II: The Desing and Analysis of Cohort Studies. Lyon: International Agency for Reasearch on Cancer (IARC); 1987:232–271. Chap. 6: Modelling the relationship between risk, dose and time,
 Sylvestre MP, Abrahamowicz M: Flexible modeling of the cumulative effects of timedependent exposures on the hazard. Stat Med. 2009, 28 (27): 34373453. 10.1002/sim.3701.View ArticlePubMed
 Richardson DB: Latency models for analyses of protracted exposures. Epidemiology. 2009, 20 (3): 395399. 10.1097/EDE.0b013e318194646d.PubMed CentralView ArticlePubMed
 Almon S: The distributed lag between capital appropriations and expenditures. Econometrica. 1965, 33: 178196. 10.2307/1911894.View Article
 Schwartz J: The distributed lag between air pollution and daily deaths. Epidemiology. 2000, 11 (3): 320326. 10.1097/0000164820000500000016.View ArticlePubMed
 Armstrong B: Models for the relationship between ambient temperature and daily mortality. Epidemiology. 2006, 17 (6): 624631. 10.1097/01.ede.0000239732.50999.8f.View ArticlePubMed
 Gasparrini A, Armstrong B, Kenward MG: Distributed lag nonlinear models. Stat Med. 2010, 29 (21): 22242234. 10.1002/sim.3940.PubMed CentralView ArticlePubMed
 Gasparrini A: Modeling exposurelagresponse associations with distributed lag nonlinear models. Stat Med. 2014, 33 (5): 881899.PubMed CentralView ArticlePubMed
 Baccini M, Kosatsky T, Biggeri A: Impact of summer heat on urban population mortality in Europe during the 1990s: an evaluation of years of life lost adjusted for harvesting. PloS One. 2013, 8 (7): 6963810.1371/journal.pone.0069638.View Article
 Honda Y, Kondo M, McGregor G, Kim H, Guo YL, Hijioka Y, Yoshikawa M, Oka K, Takano S, Hales S, Kovats RS: Heatrelated mortality risk model for climate change impact projection. Environ Health Prev Med. 2013, (Epub ahead of print. doi:10.1007/s1219901303546),
 Gasparrini A, Armstrong B: Reducing and metaanalyzing estimates from distributed lag nonlinear models. BMC Med Res Methodol. 2013, 13 (1): 110.1186/14712288131.PubMed CentralView ArticlePubMed
 Graubard BI, Fears TR: Standard errors for attributable risk for simple and complex sample designs. Biometrics. 2005, 61 (3): 847855. 10.1111/j.15410420.2005.00355.x.View ArticlePubMed
 Cox C, Li X: Modelbased estimation of the attributable risk: a loglinear approach. Comput Stat Data Anal. 2012, 56 (12): 41804189. 10.1016/j.csda.2012.04.017.PubMed CentralView ArticlePubMed
 Greenland S: Interval estimation by simulation as an alternative to and extension of confidence intervals. Int J Epidemiol. 2004, 33 (6): 13891397. 10.1093/ije/dyh276.View ArticlePubMed
 Wood SN: Generalized Additive Models: An Introduction with R. 2006, Boca Raton: Chapman & Hall/CRC,
 Schwartz J: Is there harvesting in the association of airborne particles with daily deaths and hospital admissions?. Epidemiology. 2001, 12 (1): 5561. 10.1097/0000164820010100000010.View ArticlePubMed
 Eide GE, Gefeller O: Sequential and average attributable fractions as aids in the selection of preventive strategies. J Clin Epidemiol. 1995, 48 (5): 645655. 10.1016/08954356(94)00161I.View ArticlePubMed
 Land M, Vogel C, Gefeller O: Partitioning methods for multifactorial risk attribution. Stat Methods Med Res. 2001, 10 (3): 217230. 10.1191/096228001680195166.View ArticlePubMed
 Hamel JF, Fouquet N, Ha C, Goldberg M, Roquelaure Y: Software for unbiased estimation of attributable risk. Epidemiology. 2012, 23 (4): 646647. 10.1097/EDE.0b013e318259c31c.View ArticlePubMed
 The prepublication history for this paper can be accessed here:http://www.biomedcentral.com/14712288/14/55/prepub
Prepublication history
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.