 Research article
 Open Access
 Published:
Reducing and metaanalysing estimates from distributed lag nonlinear models
BMC Medical Research Methodology volume 13, Article number: 1 (2013)
Abstract
Background
The twostage time series design represents a powerful analytical tool in environmental epidemiology. Recently, models for both stages have been extended with the development of distributed lag nonlinear models (DLNMs), a methodology for investigating simultaneously nonlinear and lagged relationships, and multivariate metaanalysis, a methodology to pool estimates of multiparameter associations. However, the application of both methods in twostage analyses is prevented by the highdimensional definition of DLNMs.
Methods
In this contribution we propose a method to synthesize DLNMs to simpler summaries, expressed by a reduced set of parameters of onedimensional functions, which are compatible with current multivariate metaanalytical techniques. The methodology and modelling framework are implemented in R through the packages dlnm and mvmeta.
Results
As an illustrative application, the method is adopted for the twostage time series analysis of temperaturemortality associations using data from 10 regions in England and Wales. R code and data are available as supplementary online material.
Discussion and Conclusions
The methodology proposed here extends the use of DLNMs in twostage analyses, obtaining metaanalytical estimates of easily interpretable summaries from complex nonlinear and delayed associations. The approach relaxes the assumptions and avoids simplifications required by simpler modelling approaches.
Background
Research on the health effects of environmental stressors, such as air pollution and temperature, often relies on time series analysis using data from multiple locations, usually cities [1, 2]. The analytical design adopted in this setting is commonly based on twostage procedures, where locationspecific exposureresponse relationships are estimated through a regression model in the first stage, and these estimates are then combined through metaanalysis in the second stage [3].
Recently, the firststage modelling approaches have been extended with the introduction of distributed lag nonlinear models (DLNMs) [4, 5], a methodology to describe simultaneously nonlinear and delayed dependencies. This modelling class is based on the definition of a crossbasis, a bidimensional space of functions describing the association along the spaces of predictor and lags. The crossbasis is specified by the choice of two basis, one for each dimension, among a set of possible options such as splines, polynomials, or step functions. Concurrently, developments have been proposed also for the second stage. In particular, techniques based on multivariate metaanalysis have been used to combine estimates of associations defined by multiple parameters, and applied for either nonlinear [6–8] or lagged dependencies [8–11]. We recently provided an overview of the use of multivariate metaanalysis in this setting [12].
In this contribution we propose a method to reduce estimates from DLNMs to summaries defined in only one dimension of predictor or lags, reexpressing the fit in terms of reduced parameters for the related unidimensional basis functions. This step decreases the number of parameters to be pooled in the second stage, offering a method to metaanalyse estimates from richly parameterized nonlinear and delayed exposureresponse relationships.
In the next section, we provide a brief recap of the algebraic development of DLNMs and multivariate metaanalysis, and then describe the main statistical development, establishing a method to reduce the fit of a DLNM to summaries expressed in a single dimension. A motivating example with an analysis of the relationship between temperature and allcause mortality is used throughout the paper to illustrate the statistical framework. We finally note some limitations and indicate future directions for research. Supplementary online material provides information on algebraic notation and software (Additional file 1), and also includes the R code and data to entirely reproduce the results in the example (Additional files 2– 3).
Methods
The twostage time series design can be applied to series of observations collected at each time t, with t = 1,…,N _{ i }, in each location i, with i = 1,…,m. Firststage regression models are fitted to each series of N _{ i } observations, obtaining locationspecific estimates of the association of interest. These estimates are then pooled across locations in the second stage, with the aim to estimate an average exposureresponse relationship and inspect heterogeneity across locations.
An illustrative example
As an illustration, we describe an analysis of the relationship between temperature and allcause mortality using daily series of N _{ i } = N = 5113 observations from each of the m = 10 regions in England and Wales, in the period 1993–2006. Further details on the dataset were previously provided [13, 14]. The example is used throughout the paper to describe the steps of the modelling framework and introduce the new methodological development. Specifically, the relationship is flexibly estimated in each region in the firststage analysis using DLNMs, and then pooled in the second stage through multivariate metaanalysis. The example aims to demonstrate how results from DLNMs are summarized in an analysis of a single region, and then how these reduced summaries can be combined across regions. Also, we illustrate a comparison with simpler modelling approaches. Modelling choices are dictated by illustrative purposes, and the results are not meant to provide substantive evidence on the association.
Distributed lag nonlinear models
The DLNM framework has been extensively described [5]. Here we provide a brief overview to facilitate the new development, illustrated later. In particular, we will focus on the bidimensional structure of this class of models, represented by the two sets of basis functions applied to derive the parameterization. Following the original paper, we first generalize the idea of simple distributed lag models (DLMs) and then introduce the nonlinear extension.
The DLNM modelling class
Distributed lag linear and nonlinear models are expressed through a lagbasis and crossbasis function s(x _{ t }), respectively, of the Nlength exposure series x =[x _{1},…,x _{ t },…,x _{ N }] ^{T}. The definition of s(x _{ t }) first requires the derivation of the N × (L + 1) matrix Q of lagged exposures, so that q _{ t·} =[x _{ t },…,x _{ t−ℓ },…,x _{ t−L }] ^{T}. This step indirectly characterizes the new lag dimension identified by the vector ℓ =[0,…,ℓ,…,L] ^{T}, with L as maximum lag. Now, choosing a first basis with dimension v _{ ℓ } to represent the association along the new lag space, we can compute a (L + 1) × v _{ ℓ } basis matrix C by applying the related functions to ℓ. A compact and general definition of the lagbasis function s(x _{ t }) for DLMs is given by:
where different models are specified with different choices of the basis to derive C. The transformed variables in W = QC can be included in the design matrix of the firststage regression model, in order to estimate the v _{ ℓ }length parameter vector η, with C ηrepresenting the lagspecific contributions.
The nonlinear extension to DLNMs requires the choice of a second basis with dimension v _{ x } to model the relationship along the space of the predictor x, obtaining the N × v _{ x } basis matrix Z from the application of the related functions to x. Applied together with the transformation which defines the matrix of lagged exposures Q above, this step produces a threedimensional N × v _{ x } × (L + 1) array \stackrel{\u0307}{\mathbf{R}}. The parameterization of the crossbasis function s(x _{ t }) for DLNMs is then given by:
The simpler lagbasis for DLMs in (1) is a special case of the more complex crossbasis for DLNMs in (2). These models may be fitted through common regression techniques with the inclusion of crossbasis matrix W in the design matrix. The vector \widehat{\mathit{\eta}} of estimated parameters of the crossbasis function in (2) represents a simultaneously nonlinear and lagged dependency, and its length v _{ x } × v _{ ℓ } is equal to the product of the dimensions of the bases for the two spaces. In completely parametric models as those described here, this dimensionality is directly associated with the notion of degrees of freedom (df), related to the flexibility of the function and the smoothness of the estimated dependency. In spite of the relatively complex algebraic definition in (2), DLNMs are solely specified by the choice of the two bases for deriving the matrices Z and C. The software implementation of this methodology in the R package dlnm has been previously described [15].
Summarizing the results from a DLNM
Fitted bidimensional crossbasis functions from DLNMs can be interpreted by deriving predictions over a grid of predictor and lag values, usually computed relative to a reference predictor value. As a first example, we show the results of a singlelocation analysis, using data from the NorthEast region of England. The temperaturemortality relationship is modelled through the same crossbasis used for the full twostage analysis illustrated later, composed by two Bspline bases.
The results are shown in Figure 1. The topleft panel displays the bidimensional surface of the fitted relative risk (RR) in a 3D graph, predicted for the grid of temperature and lag values, with a reference black line corresponding to the centering value of the basis for the predictor space (here 17°C). Similarly to previous analyses, the figure suggests an immediate increase in risk for high temperature, and a more delayed but protracted effect for low temperature.
This bidimensional representation contains details not relevant for some interpretative purposes, and does not easily allow presentation of confidence intervals. The analysis therefore commonly focuses on three specific unidimensional summaries of the association, also illustrated in Figure 1. First, a predictorspecific summary association at a given predictor value x _{0} can be defined along the lag space. As an example, this is reproduced in the topright panel for temperature x _{0} = 22°C, together with 95% confidence intervals (CI), and corresponds to the red line parallel to the reference in the 3D graph. Second, similarly, a lagspecific summary association at a given lag value ℓ _{0} can be defined along the predictor space. This is shown in the bottomleft panel for lag ℓ _{0} = 4, and coincides with the red line in the 3D graph perpendicular to the reference. Third, the sum of the lagspecific contributions provides the overall cumulative association, showed in the bottomright panel of Figure 1. This last summary offers an estimate of the net effect associated with a given exposure cumulated over the lag period L, and is usually the focus of the analysis.
Multivariate metaanalysis
The framework of multivariate metaanalysis has been previously described [16], and its application for combining estimates of multiparameter associations has been recently illustrated [12]. We offer a brief summary here, firstly illustrating the secondstage multivariate metaanalytical model and then discussing its limitation for pooling DLNMs.
The multivariate extension of metaanalysis
Specification of the model assumes that a kdimensional set of outcome parameters {\widehat{\mathit{\eta}}}_{i} and associated k × k (co)variance matrix S _{ i } have been estimated in each of the i = 1,…,mstudies. In the application for twostage time series analysis, these outcome parameters represent regression coefficients from the first stage, while the term study refers here to the firststage analysis in each location. The description below illustrates a randomeffects multivariate metaregression model, where fixedeffects models or simple metaanalysis treated as special cases. The model for location i is defined as:
where the locationspecific estimated outcome parameters {\widehat{\mathit{\eta}}}_{i} are assumed to follow a kdimensional multivariate normal distribution. The k×kp blockdiagonal matrix {\mathbf{U}}_{i}={\mathbf{I}}_{\left(k\right)}\otimes {\mathbf{u}}_{i}^{T} is the Kronecker expansion of the locationspecific vector u _{ i } = u _{1},…,u _{ p } ^{T} of metavariables. The matrices Ψ and S _{ i } represent the between and withinlocation (co)variance matrices, respectively. This multivariate metaregression model is applied to estimate the parameter vectors β and ξ. The former represents the kp secondstage coefficients defining how the p metavariables are associated with each of the true k firststage coefficients in η _{ i }. The vector ξ includes a set of parameters which uniquely define the betweenlocation (co)variance matrix Ψ, depending on the chosen structure for this matrix. In fixed effects models no additional variability beyond the estimation error in the firststage model is assumed for {\widehat{\mathit{\eta}}}_{i}, and Σ _{ i } = S _{ i }. In multivariate metaanalysis with no metavariable, U = I _{(k)} and β = η, the vector of average parameters. The development in (3) can be considered as a special case of multivariate linear mixed model [17], where the withincity (co)variance is assumed known. Among alternative estimation methods, such as Bayesian [18] and multivariate extensions of the method of moments [19], we privilege here likelihoodbased approaches [20, 21]. Methods to derive tests and confidence intervals, fit statistics and (bestlinear unbiased) predictions have been previously developed within the linear mixed models framework for the application in this setting, together with a description of the software implementation in the R package mvmeta[12].
Limitations of multivariate metaanalysis
In theory, the m sets of estimated firststage coefficients {\widehat{\mathit{\eta}}}_{i} of the crossbasis obtained from DLNMs in (2) can be metaanalysed using (3), producing an populationaveraged threedimensional effect surface across locations, optionally conditional on metavariables in multivariate metaregression. However, as anticipated above, the definition of DLNMs in (2) requires k = v _{ x } × v _{ ℓ } parameters {\widehat{\mathit{\eta}}}_{i} for the crossbasis. For models specified by even moderately complex bases in each space, the number of parameters becomes so high that the optimization routine for multivariate metaanalysis is computationally impractical. This is particularly relevant for the (co)variance terms in ξ defining the true betweenlocation variability, composed by k(k + 1)/2 parameters for an unstructured matrix Ψ.
This limitation is one of the main reasons which have prevented so far the full application of DLNMs in twostage analysis. The modelling approach has often required the simplification of the firststage model, for the secondstage multivariate metaanalysis to be feasible. For example, investigators have assumed a linear relationship in the dimension of the predictor [8–11], or computed a simple exposure moving average for the lag space [6–8]. We previously adopted the same limited approach [12]. The development of methods to derive metaanalytical estimates from full DLNMs would offer a great deal of flexibility in the investigation of complex exposureresponse dependencies.
Reducing DLNMs
Predictions from DLNMs as those shown in Figure 1 are obtained by selecting the grid of exposure and lag values defined as x _{[p]} and ℓ _{[p]}, respectively. Details on the algebraic development are given elsewhere [5] [sections 4.2–4.3]. Briefly, predictions are computed by deriving matrices Z _{[p]} and C _{[p]} from x _{[p]} and ℓ _{[p]}, respectively, through the application of the same basis functions used for estimation in (2). The bidimensional predicted relationship in location i is expressed by the full set of estimated parameters in {\widehat{\mathit{\eta}}}_{i} and quantities derived by Z _{[p]} and C _{[p]}. However, the specific summaries described in the previous section are defined only on the single dimension of predictor (lagspecific and overall cumulative associations, bottom panels of Figure 1) or lag (predictorspecific association, topright panel of Figure 1). The idea is to reparameterize these summaries in terms of the related unidimensional basis Z _{[p]} or C _{[p]} for predictor or lags only, respectively, and sets of reduced in coefficients {\widehat{\mathit{\theta}}}_{i}. Dimensionality of the functions expressing such summaries therefore decreases from v _{ x } × v _{ ℓ }, corresponding to the length of vector {\widehat{\mathit{\eta}}}_{i} in the original parameterization, to v _{ x } or v _{ ℓ } only, corresponding to the length of new sets of reduced parameters {\widehat{\mathit{\theta}}}_{i}.
The definition of the reduced parameters depends on the specific summary among those listed above. They can be obtained by applying a related dimensionreducing matrix M, expressed as:
for predictorspecific summary association at x _{0}, for for lagspecific summary association at ℓ _{0}, and for overall cumulative summary association, respectively. Here {\mathbf{z}}_{\left[{x}_{0}\right]} and {\mathbf{c}}_{\left[{\ell}_{0}\right]} are the vectors of transformed values of x _{0}and ℓ _{0} obtained by the application of the sets of basis functions for predictor and lags, respectively. The reduced parameter vector {\widehat{\mathit{\theta}}}_{i} and associated (co)variance matrix V\left({\widehat{\mathit{\theta}}}_{i}\right) are then obtained by:
with j ∈ {x _{0},ℓ _{0},c}. The predictorspecific association at x _{0}, defined on the lag space for values in ℓ _{[p]}, is expressed by {\mathbf{C}}_{\left[p\right]}{\widehat{\mathit{\theta}}}_{\left[{x}_{0}\right]i}, with standard errors provided by the square root of {\mathbf{C}}_{\left[p\right]}V\left({\widehat{\mathit{\theta}}}_{\left[{x}_{0}\right]i}\right){\mathbf{C}}_{\left[p\right]}^{T}. The lagspecific association at ℓ _{0} and the overall cumulative association, defined on the predictor dimension for values in x _{[p]i }, are then obtained by {\mathbf{Z}}_{\left[p\right]}{\widehat{\mathit{\theta}}}_{\left[{\ell}_{0}\right]i} and {\mathbf{Z}}_{\left[p\right]}{\widehat{\mathit{\theta}}}_{\left[c\right]i}, respectively, with computation for standard errors as above. The number of coefficients defining these summaries, reduced from v _{ x } × v _{ ℓ } to v _{ ℓ } or v _{ x } only, is usually compatible with the application of standard multivariate metaanalysis techniques, which can now be used to combine the estimates of these summaries from DLNMs in twostage analyses. In addition, the derivation in (4)–(5) simplifies the algebra for DLNM predictions originally provided [5] [Section 4.3]. The dimension reduction comes at the price of loss of information about the association on one of the two dimensions, as the rankdeficiency of M does not allow reversing the reduction applied in (5).
Results
The analysis is now extended to the full set of 10 regions, with the aim to produce pooled estimates of the overall cumulative association, and to compare the results with those obtained by simpler approaches, applying a moving average to the daily exposure series. Also, we investigate the lag structure for exposure to cold and hot temperatures through predictorspecific estimates. Finally, we assess heterogeneity and then the role of metavariables through multivariate metaregression.
Modelling strategy
The firststage regionspecific model is specified by adopting a standard analytical approach for time series environmental data [1, 3, 4]. In each region, we fit a common generalized linear model for the quasiPoisson family to the series of allcause mortality counts. The model includes the crossbasis for daily mean temperature, a natural cubic spline of time with 10 df / year to control for the longterm and seasonal variation, and indicator variables for day of the week.
In the main firststage model, the temperaturemortality association is estimated by a flexible crossbasis defined by a quadratic Bspline for the space of temperature, centered at 17°C, and a natural cubic Bspline with intercept for the space of lags, with maximum lag L = 21. We place two internal knots at equally spaced values along temperature (5.3°C and 15.1°C) and three internal knots at equallyspaced logvalues of lag (1.0, 2.8 and 7.6), with boundary knots at −4.4°C and 24.9°C, and 0 and 21 lags, respectively. These choices define spline bases with dimensions v _{ x } = 4 and v _{ ℓ } = 5 for temperature and lag spaces, respectively. The same specification was previously applied for the singleregion analysis.
The set of v _{ x } × v _{ ℓ } = 20 coefficients of the crossbasis variables with associated (co)variance matrices, estimated in each region, are then reduced. Specifically, for region i we derive the vector {\widehat{\mathit{\theta}}}_{\left[c\right]i} with 4 reduced parameters of the quadratic Bspline of temperature Z _{[p]} for the overall cumulative summary association, and two vectors {\widehat{\mathit{\theta}}}_{\left[0\right]i} and {\widehat{\mathit{\theta}}}_{\left[22\right]i} with sets of 5 reduced parameters of the natural cubic Bspline of lags C _{[p]} for predictorspecific summary associations at 0°C and 22°C. These temperatures correspond approximately to the 1^{st} and 99^{th} of the pooled temperature distribution, respectively. These effects along lags are interpreted using the reference of 17°C.
For comparison with methods not requiring dimensionality reduction, in two alternative firststage models we simplify the lag structure by fitting onedimensional splines to the moving average of the temperature series over lag 0–3 and 0–21, respectively. Such moving average models have been commonly used in weather and air pollution epidemiology [4, 10, 22]. These alternatives can be described as DLNMs including crossbases with a constant function to represent the relationship in the lag space, while keeping the same quadratic Bspline for the space of the predictor, as described for the main model above. In these simplified models, the dimension of fitted relationship does not need to be reduced. In fact, the application of the reduction method returns the original v _{ x } × v _{ ℓ } = 4 × 1 = 4 parameters rescaled by the number of lags, giving a dimensionreducing matrix M _{[c]}, as described in (4c), composed in this case by a diagonal matrix with entries corresponding to a constant equal to the number of lags.
The coefficients for each of the three summary associations from the main model are estimated in the 10 regions and then independently included as outcomes in three multivariate metaanalytical secondstage models. The ten estimated sets of coefficients from the two alternative models (equivalent to the overall cumulative summary) were directly metaanalysed. All the secondstage models are fitted here through restricted maximum likelihood (REML) using the R package mvmeta. We first derive an estimate of the pooled relationship through multivariate metaanalysis, and then extend the results showing an example of multivariate metaregression which includes populationaveraged regional latitude as a metavariable. The effect of latitude is displayed by predicting the averaged temperaturemortality associations for the 25^{th} and 75^{th} percentiles of its distribution, using the same baseline reference of 17°C. The significance of such an effect is assessed through a Wald test, given a likelihood ratio test cannot be applied to compare model fitted with REML and different fixedeffects structures [12].
Twostage analysis
The overall temperaturemortality associations in the 10 regions of England and Wales are illustrated in Figure 2. The left panel shows the regionsspecific summary associations from the first stage, together with the pooled average from multivariate metaanalysis, as predicted by the main flexible model. Regionsspecific estimates show similar curves, although some variability exists, in particular at the extremes. Consistently with previous findings, the pooled curve suggests an increase in relative risk (RR) for both cold and hot temperatures, although less pronounced for the latter, and with a steeper increase for extreme when compared to mild cold. The average point of minimum mortality is at 17.1°C, corresponding approximately to the 90^{th} percentile of the pooled temperature distribution. The multivariate Cochran Q test for heterogeneity is highly significant (pvalue < 0.001), and the related I ^{2} statistic indicates that 63.7% of the variability is due to true heterogeneity between regions.
The right panel of Figure 2 illustrates the comparison with the two alternative simpler models. We see that the association based on the 0–21 lag moving average temperature approximates that based on a flexible DLNM in the cold range, but completely misses the heat effect. The reverse is true for the association based on the 0–3 lag moving average temperature.
Figure 3 depicts the pooled estimate from the main model for predictorspecific summary associations at 22°C and 0°C, with the same reference of 17°C, as predicted by the two sets of v _{ ℓ } = 5 reduced coefficients. Consistently with previous research, the effect of hot temperature is immediate and disappears after 1–2 days, while cold temperatures are associated with mortality for a long lag period, after an initial protective effect. This complex lag pattern can explain the different results provided by the less flexible alternative models. The pooled overall RR estimated by the main model, cumulated along lags for these specific summaries and reported graphically in Figure 2 (left panel), are 1.101 (95%CI: 1.078–1.124) for 22°C and and 1.308 (95%CI: 1.245–1.375) for 0°C, respectively. The Cochran Q test is significant for the lag curve at 0°C (pvalue < 0.001), but not for that at 22°C (pvalue = 0.178), with an I ^{2} of 63.4% and 16.0%, respectively.
It is interesting to note that the secondstage multivariate metaanalytical model for the predictorspecific summary associations at 22°C estimates perfectly correlated random components, with betweenstudy correlations equal to −1 or 1. This is a known phenomenon in multivariate metaanalysis, frequently occuring in the presence of a small number of studies and/or a high withinstudy uncertainty relative to the betweenstudy variation [23]. However, in this case, the results from the Cochran Q test suggest that a fixedeffects multivariate model may be preferable, and as expected, this model returns almost identical estimates for the pooled summary associations (results not shown).
The heterogeneity across regions can be partly explained as effect modification by regionspecific variables. The results of the example of metaregression with latitude are illustrated in Figure 4. The top panel suggests a differential overall cumulative association between northern and southern regions, a pattern previously reported [8, 10]. Interestingly, the effect modification seems to occur for cold, with a higher effect in southern regions, but not for heat. The estimated pooled RR at 0°C versus 17°C are 1.380 (95%CI: 1.337–1.424) and 1.237 (95%CI: 1.198–1.277) for the 25^{th} and 75^{th} percentiles of latitude, respectively, while the same estimates are 1.106 (95%CI: 1.079–1.133) and 1.104 (95%CI: 1.059–1.150) for 22°C. Overall, the evidence for an effect modification is substantial, with a highly significant Wald test (pvalue < 0.001). Latitude explains much of the heterogeneity across regions, with an I ^{2} reduced to 18.7% and a nonsignificant Cochran Q test (pvalue = 0.174). The bottom panels illustrates the same effect modification for predictorspecific summary associations at 22°C and 0°C. Consistently, the Wald test indicates a significant effect for cold (pvalue < 0.001), but not for heat (pvalue = 0.634).
Discussion
In this contribution we describe a method to reexpress the bidimensional fit of DLNMs in terms of unidimensional summaries, involving reduced sets of modified parameters of the basis functions chosen for the space of predictor or lags. This development, in addition to simplifying the algebraic definition of the methodology, offers a more compact description of the bidimensional association modelled by DLNMs. In particular, the dimension of the sets of reduced parameters is usually compatible with the application of multivariate metaanalytical techniques in a twostage framework, allowing the analysis of complex nonlinear and delayed associations in multilocation studies.
Previous applications of the twostage design for multilocation time series studies are based on simplified functions for modelling the association of interest at the first stage. In particular, the analyses are usually limited to splines or other nonlinear functions of simple moving average of the exposure series [6–8], a modelling approach similar to the alternative models used for comparison in our example. Alternatively, the simplification could be applied in the other dimension of predictor, specifying DLMs for linear or linearthreshold exposureresponse relationships [8–11]. All these approaches require strong assumptions on the exposureresponse dependency, in order to simplify the association modelled in one of the two dimensions within the first stage. These are prone to biases when the true underlying dependency is misspecified. The framework we propose, in contrast, require less assumptions or simplifications regarding the association in the firststage model, but rather reduces the estimates to unidimensional summaries of a bidimensional fit. The advantages of this approach are exemplified by the comparison of the simpler alternatives with the bidimensionally flexible model, illustrated in the Results section. This methodology offers greater flexibility in the investigation of complex associations through a twostage analysis.
Most of the limitations of DLNMs and multivariate metaanalysis of multiparameter associations, previously discussed [5, 12], identically apply to this framework. In particular, the issues of model selection and control for confounding pose important challenges, and are matters of current and future research. The issue of model selection is particularly relevant, due to the bidimensional nature of the models, where two independent bases are chosen to describe the dependency along predictor and lag spaces, respectively. In our example, we selected the bases apriori for illustrative purposes, but model selection is clearly more problematic in applied analyses.
The problem of estimating perfectly correlated random components in the secondstage metaanalytical model, as described in the example, can bias upward the standard errors of the pooled estimates. This problem occurs in likelihoodbased and method of moments estimation procedures of multivariate metaanalysis, as these estimators truncate the betweenstudy correlations on the boundary of their parameter space [16, 19, 23]. Although in many cases this problem arises with small number of studies or when the amount of heterogeneity is negligible (and thus when a fixedeffects model is preferable), alternative approaches may be considered. First, different estimation methods can be applied, for example by imposing some structure to the betweenstudy (co)variance matrix, or adopting a Bayesian approach that employ weakly informative priors models could avoid truncation of betweenstudy correlations. Also, alternative parameterization of the crossbasis functions may reduce the correlation pattern in the first stage and avoid estimation problems in the secondstage multivariate model. This issue needs to be explored further.
The definition of identical crossbasis functions in all the locations can be problematic in the presence of substantially different exposure ranges. In our example, the temperature distribution was similar across regions, and the placements of common knots was straightforward. However, this can be hardly generalized. The issue was previously discussed, and an alternative approach based on relative scale was proposed for pooling onedimensional functions [12]. The same method is applicable for bidimensional DLNMs. However, this limitation requires further research.
Estimation methods for DLNMs not requiring the completely parametric approach proposed here seems attractive and possible, in particular based on penalized likelihood [24] or Bayesian methods [25]. These estimation procedures also provide automatic selection methods. These options require the specification of a large number of parameters, which are then shrunk during the fitting procedures to reach a far smaller number of equivalent df. However, the high dimensionality of the fitted model may present a problem for the secondstage multivariate metaanalysis, even when reduced to unidimensional summaries following (4)–(5). Techniques for metaanalysis of highdimensional estimates are a topic of current and future research.
Potentially, the number of parameters of the secondstage multivariate metaanalysis can also be decreased by structuring the betweenstudy (co)variance matrix of random effects. However, the extent to which such a choice can bias the estimates of fixedeffects parameters is not known. Moreover, this option is not yet available in the R package mvmeta, and will be implemented and assessed in future analyses.
Conclusions
The extension of the DLNM framework presented here, involving the reduction of the complex twodimensional fit to onedimensional summaries, provides an improved method to study complex nonlinear and delayed associations in twostage analyses. Unlike previous approaches proposed so far, this method requires less simplification of the exposureresponse shape or lag structure. This framework may be applied in any setting where nonlinear and delayed relationships needs to be investigated in different populations or groups.
Abbreviations
 DLM:

distributed lag model.
References
Touloumi G, Atkinson R, Le Tertre A, Samoli E, Schwartz J, Schindler C, Vonk JM, Rossi G, Saez M, Rabszenko D: Analysis of health outcome time series data in epidemiological studies. EnvironMetrics. 2004, 15 (2): 101117. 10.1002/env.623.
Bell ML, Samet JM, Dominici F: Timeseries studies of particulate matter. Annu Rev Public Health. 2004, 25: 247280. 10.1146/annurev.publhealth.25.102802.124329.
Dominici F, Samet JM, Zeger SL: Combining evidence on air pollution and daily mortality from the 20 largest US cities: a hierarchical modelling strategy. J R Stat Soc: Ser A. 2000, 163 (3): 263302. 10.1111/1467985X.00170.
Armstrong B: Models for the relationship between ambient temperature and daily mortality. Epidemiology. 2006, 17 (6): 624631. 10.1097/01.ede.0000239732.50999.8f.
Gasparrini A, Armstrong B, Kenward MG: Distributed lag nonlinear models. Stat Med. 2010, 29 (21): 22242234. 10.1002/sim.3940.
Dominici F, Daniels MJ, Zeger SL, Samet JM: Air pollution and mortality: estimating regional and national doseresponse relationships. J Am Stat Assoc. 2002, 97: 100111. 10.1198/016214502753479266.
Samoli E, Touloumi G, Zanobetti A, Le Tertre A, Schindler C, Atkinson R, Vonk J, Rossi G, Saez M, Rabczenko D, et al: Investigating the doseresponse relation between air pollution and total mortality in the APHEA2 multicity project. Occup Environ Med. 2003, 60 (12): 977982. 10.1136/oem.60.12.977.
Baccini M, Biggeri A, Accetta G, Kosatsky T, Katsouyanni K, Analitis A, Anderson HR, Bisanti L, D’Ippoliti D, Danova J, Forsberg B, Medina S, Paldy A, Rabczenko D, Schindler C, Michelozzi P: Heat effects on mortality in 15 European cities. Epidemiology. 2008, 19 (5): 711719. 10.1097/EDE.0b013e318176bfcd.
Zanobetti A, Schwartz J, Samoli E, Gryparis A, Touloumi G, Atkinson R, Le Tertre A, Bobros J, Celko M, Goren A, Forsberg B, Michelozzi P, Rabczenko D, Aranguez Ruiz E, Katsouyanni K: The temporal pattern of mortality responses to air pollution: a multicity assessment of mortality displacement. Epidemiology. 2002, 13: 8793. 10.1097/0000164820020100000014.
Analitis A, Katsouyanni K, Biggeri A, Baccini M, Forsberg B, Bisanti L, Kirchmayer U, Ballester F, Cadum E, Goodman PG, Hojs A, Sunyer J, Tiittanen P, Michelozzi P: Effects of cold weather on mortality: results from 15 European cities within the PHEWE Project. Am J Epidemiol. 2008, 168 (12): 139710.1093/aje/kwn266.
Samoli E, Zanobetti A, Schwartz J, Atkinson R, Le Tertre A, Schindler C, Perez L, Cadum E, Pekkanen J, Paldy A, Touloumi G, Katsouyanni K: The temporal pattern of mortality responses to ambient ozone in the APHEA project. J Epidemiol Community Health. 2009, 63: 960966. 10.1136/jech.2008.084012.
Gasparrini A, Armstrong B, Kenward MG: Multivariate metaanalysis for nonlinear and other multiparameter associations. Stat Med. 2012, 31 (29): 38213839. 10.1002/sim.5471.
Armstrong BG, Chalabi Z, Fenn B, Hajat S, Kovats S, Milojevic A, Wilkinson P: The association of mortality with high temperatures in a temperate climate: England and Wales. J Epidemiol Community Health. 2011, 65 (4): 340345. 10.1136/jech.2009.093161.
Gasparrini A, Armstrong B, Kovats S, Wilkinson P: The effect of high temperatures on causespecific mortality in England and Wales. Occup Environ Med. 2012, 69: 5661. 10.1136/oem.2010.059782.
Gasparrini A: Distributed lag linear and nonlinear models in R: the package dlnm. J Stat Software. 2011, 43 (8): 120.
Jackson D, Riley R, White IR: Multivariate metaanalysis: potential and promise. Stat Med. 2011, 30 (20): 24812498.
Verbeke G, Molenberghs G: Linear mixed models for longitudinal data. 2000, New York: SpringerVerlag
Nam IS, Mengersen K, Garthwaite P: Multivariate metaanalysis. Stat Med. 2003, 22 (14): 23092333. 10.1002/sim.1410.
Jackson D, White IR, Thompson SG: Extending DerSimonian and Laird’s methodology to perform multivariate random effects metaanalyses. Stat Med. 2010, 29 (12): 12821297.
Harville DA: Maximum likelihood approaches to variance component estimation and to related problems. J Am Stat Assoc. 1977, 72 (358): 320338. 10.1080/01621459.1977.10480998.
van Houwelingen HC, Arends LR, Stijnen T: Advanced methods in metaanalysis: multivariate approach and metaregression. Stat Med. 2002, 21 (4): 589624. 10.1002/sim.1040.
Anderson BG, Bell ML: Weatherrelated mortality: how heat, cold, and heat waves affect mortality in the United States. Epidemiology. 2009, 20 (2): 205213. 10.1097/EDE.0b013e318190ee08.
Riley RD, Abrams KR, Sutton AJ, Lambert PC, Thompson JR: Bivariate randomeffects metaanalysis and the estimation of betweenstudy correlation. BMC Med Res Methodology. 2007, 7: 310.1186/1471228873.
Zanobetti A, Wand MP, Schwartz J, Ryan LM: Generalized additive distributed lag models: quantifying mortality displacement. Biostatistics. 2000, 1 (3): 279292. 10.1093/biostatistics/1.3.279.
Welty LJ, Peng RD, Zeger SL, Dominici F: Bayesian distributed lag models: estimating effects of particulate matter air pollution on daily mortality. Biometrics. 2008, 65: 282291.
Prepublication history
The prepublication history for this paper can be accessed here:http://www.biomedcentral.com/14712288/13/1/prepub
Acknowledgements
Antonio Gasparrini is currently funded by a Methodology Research Fellowship from Medical Research Council UK (grant ID G1002296). Ben Armstrong and Antonio Gasparrini were supported by a grant from Medical Research Council UK during the preliminary stage of the project (grant ID G0701030).
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Author’s contributions
BA firstly conceived the idea of reexpressing summaries of DLNMs in terms of onedimensional functions. AG then derived the algebraic expression. AG and BA contributed to the structure of the manuscript and the design of the analysis in the examples. AG implemented the methodology in the R software, performed the analysis, and took the lead in drafting the manuscript. BA contributed to drafting the manuscript. All authors read and approved the final version of the manuscript.
Electronic supplementary material
12874_2012_900_MOESM1_ESM.pdf
Additional file 1: Online appendix. This pdf document provides additional information on the algebraic notation, on the software and R code, and on the time series data used in the example. (PDF 224 KB)
12874_2012_900_MOESM2_ESM.csv
Additional file 2: Data. This csv file includes the time series data for the 10 regions of England and Wales during the period 1993–2006, used in the example. (CSV 7 MB)
12874_2012_900_MOESM3_ESM.zip
Additional file 3: R scripts. This.zip file contains 6 R scripts. These are files with extension.R which can be used to reproduce the results of the analysis in the example. They can be opened directly in R or read with a text editor. (ZIP 9 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Gasparrini, A., Armstrong, B. Reducing and metaanalysing estimates from distributed lag nonlinear models. BMC Med Res Methodol 13, 1 (2013). https://doi.org/10.1186/14712288131
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/14712288131
Keywords
 Distributed lag models
 Multivariate metaanalysis
 Twostage analysis
 Time series