The two-stage time series design can be applied to series of observations collected at each time *t*, with *t* = 1,…,*N*
_{
i
}, in each location *i*, with *i* = 1,…,*m*. First-stage regression models are fitted to each series of *N*
_{
i
} observations, obtaining location-specific estimates of the association of interest. These estimates are then pooled across locations in the second stage, with the aim to estimate an average exposure-response relationship and inspect heterogeneity across locations.

### An illustrative example

As an illustration, we describe an analysis of the relationship between temperature and all-cause mortality using daily series of *N*
_{
i
} = *N* = 5113 observations from each of the *m* = 10 regions in England and Wales, in the period 1993–2006. Further details on the dataset were previously provided [13, 14]. The example is used throughout the paper to describe the steps of the modelling framework and introduce the new methodological development. Specifically, the relationship is flexibly estimated in each region in the first-stage analysis using DLNMs, and then pooled in the second stage through multivariate meta-analysis. The example aims to demonstrate how results from DLNMs are summarized in an analysis of a single region, and then how these reduced summaries can be combined across regions. Also, we illustrate a comparison with simpler modelling approaches. Modelling choices are dictated by illustrative purposes, and the results are not meant to provide substantive evidence on the association.

### Distributed lag non-linear models

The DLNM framework has been extensively described [5]. Here we provide a brief overview to facilitate the new development, illustrated later. In particular, we will focus on the bi-dimensional structure of this class of models, represented by the two sets of basis functions applied to derive the parameterization. Following the original paper, we first generalize the idea of simple distributed lag models (DLMs) and then introduce the non-linear extension.

#### The DLNM modelling class

Distributed lag linear and non-linear models are expressed through a lag-basis and cross-basis function *s*(*x*
_{
t
}), respectively, of the *N*-length exposure series **x** =[*x*
_{1},…,*x*
_{
t
},…,*x*
_{
N
}] ^{T}. The definition of *s*(*x*
_{
t
}) first requires the derivation of the *N* × (*L* + 1) matrix **Q** of lagged exposures, so that **q**
_{
t·} =[*x*
_{
t
},…,*x*
_{
t−ℓ
},…,*x*
_{
t−L
}] ^{T}. This step indirectly characterizes the new lag dimension identified by the vector *ℓ* =[0,…,*ℓ*,…,*L*] ^{T}, with *L* as maximum lag. Now, choosing a first basis with dimension *v*
_{
ℓ
} to represent the association along the new lag space, we can compute a (*L* + 1) × *v*
_{
ℓ
} basis matrix **C** by applying the related functions to *ℓ*. A compact and general definition of the lag-basis function *s*(*x*
_{
t
}) for DLMs is given by:

s({x}_{t};\mathit{\eta})=\sum _{k=1}^{{v}_{\ell}}{\mathbf{q}}_{t\xb7}^{T}{\mathbf{c}}_{\xb7k}{\eta}_{k}={\mathbf{q}}_{t\xb7}^{T}\mathbf{C}\mathit{\eta}={\mathbf{w}}_{t\xb7}^{T}\mathit{\eta}\text{,}

(1)

where different models are specified with different choices of the basis to derive **C**. The transformed variables in **W** = **QC** can be included in the design matrix of the first-stage regression model, in order to estimate the *v*
_{
ℓ
}-length parameter vector *η*, with **C** *η*representing the lag-specific contributions.

The non-linear extension to DLNMs requires the choice of a second basis with dimension *v*
_{
x
} to model the relationship along the space of the predictor *x*, obtaining the *N* × *v*
_{
x
} basis matrix **Z** from the application of the related functions to **x**. Applied together with the transformation which defines the matrix of lagged exposures **Q** above, this step produces a three-dimensional *N* × *v*
_{
x
} × (*L* + 1) array \stackrel{\u0307}{\mathbf{R}}. The parameterization of the cross-basis function *s*(*x*
_{
t
}) for DLNMs is then given by:

s({x}_{t};\mathit{\eta})=\sum _{j=1}^{{v}_{x}}\sum _{k=1}^{{v}_{\ell}}{\mathbf{r}}_{\mathit{\text{tj}}\xb7}^{T}{\mathbf{c}}_{\xb7k}{\eta}_{\mathit{\text{jk}}}={\mathbf{w}}_{t\xb7}^{T}\mathit{\eta}\text{.}

(2)

The simpler lag-basis for DLMs in (1) is a special case of the more complex cross-basis for DLNMs in (2). These models may be fitted through common regression techniques with the inclusion of cross-basis matrix **W** in the design matrix. The vector \widehat{\mathit{\eta}} of estimated parameters of the cross-basis function in (2) represents a simultaneously non-linear and lagged dependency, and its length *v*
_{
x
} × *v*
_{
ℓ
} is equal to the product of the dimensions of the bases for the two spaces. In completely parametric models as those described here, this dimensionality is directly associated with the notion of *degrees of freedom* (*df*), related to the flexibility of the function and the smoothness of the estimated dependency. In spite of the relatively complex algebraic definition in (2), DLNMs are solely specified by the choice of the two bases for deriving the matrices **Z** and **C**. The software implementation of this methodology in the R package dlnm has been previously described [15].

#### Summarizing the results from a DLNM

Fitted bi-dimensional cross-basis functions from DLNMs can be interpreted by deriving predictions over a grid of predictor and lag values, usually computed relative to a reference predictor value. As a first example, we show the results of a single-location analysis, using data from the North-East region of England. The temperature-mortality relationship is modelled through the same cross-basis used for the full two-stage analysis illustrated later, composed by two B-spline bases.

The results are shown in Figure 1. The top-left panel displays the bi-dimensional surface of the fitted relative risk (RR) in a 3-D graph, predicted for the grid of temperature and lag values, with a reference black line corresponding to the centering value of the basis for the predictor space (here 17°C). Similarly to previous analyses, the figure suggests an immediate increase in risk for high temperature, and a more delayed but protracted effect for low temperature.

This bi-dimensional representation contains details not relevant for some interpretative purposes, and does not easily allow presentation of confidence intervals. The analysis therefore commonly focuses on three specific uni-dimensional *summaries* of the association, also illustrated in Figure 1. First, a *predictor-specific* summary association at a given predictor value *x*
_{0} can be defined along the lag space. As an example, this is reproduced in the top-right panel for temperature *x*
_{0} = 22°C, together with 95% confidence intervals (CI), and corresponds to the red line parallel to the reference in the 3-D graph. Second, similarly, a *lag-specific* summary association at a given lag value *ℓ*
_{0} can be defined along the predictor space. This is shown in the bottom-left panel for lag *ℓ*
_{0} = 4, and coincides with the red line in the 3-D graph perpendicular to the reference. Third, the sum of the lag-specific contributions provides the *overall cumulative* association, showed in the bottom-right panel of Figure 1. This last summary offers an estimate of the net effect associated with a given exposure cumulated over the lag period *L*, and is usually the focus of the analysis.

### Multivariate meta-analysis

The framework of multivariate meta-analysis has been previously described [16], and its application for combining estimates of multi-parameter associations has been recently illustrated [12]. We offer a brief summary here, firstly illustrating the second-stage multivariate meta-analytical model and then discussing its limitation for pooling DLNMs.

#### The multivariate extension of meta-analysis

Specification of the model assumes that a *k*-dimensional set of outcome parameters {\widehat{\mathit{\eta}}}_{i} and associated *k* × *k* (co)variance matrix **S**
_{
i
} have been estimated in each of the *i* = 1,…,*m*studies. In the application for two-stage time series analysis, these outcome parameters represent regression coefficients from the first stage, while the term study refers here to the first-stage analysis in each location. The description below illustrates a random-effects multivariate meta-regression model, where fixed-effects models or simple meta-analysis treated as special cases. The model for location *i* is defined as:

{\widehat{\mathit{\eta}}}_{i}\sim \mathrm{N}\left(\right.{\mathbf{U}}_{i}\mathit{\beta}\phantom{\rule{0.3em}{0ex}},\phantom{\rule{0.3em}{0ex}}{\mathbf{S}}_{i}+\mathbf{\Psi}\left)\right.\text{,}

(3)

where the location-specific estimated outcome parameters {\widehat{\mathit{\eta}}}_{i} are assumed to follow a *k*-dimensional multivariate normal distribution. The *k*×*kp* block-diagonal matrix {\mathbf{U}}_{i}={\mathbf{I}}_{\left(k\right)}\otimes {\mathbf{u}}_{i}^{T} is the Kronecker expansion of the location-specific vector **u**
_{
i
} = *u*
_{1},…,*u*
_{
p
}
^{T} of meta-variables. The matrices **Ψ** and **S**
_{
i
} represent the between and within-location (co)variance matrices, respectively. This multivariate meta-regression model is applied to estimate the parameter vectors *β* and *ξ*. The former represents the *kp* second-stage coefficients defining how the *p* meta-variables are associated with each of the true *k* first-stage coefficients in *η*
_{
i
}. The vector *ξ* includes a set of parameters which uniquely define the between-location (co)variance matrix **Ψ**, depending on the chosen structure for this matrix. In fixed effects models no additional variability beyond the estimation error in the first-stage model is assumed for {\widehat{\mathit{\eta}}}_{i}, and **Σ**
_{
i
} = **S**
_{
i
}. In multivariate meta-analysis with no meta-variable, **U** = **I**
_{(k)} and *β* = *η*, the vector of average parameters. The development in (3) can be considered as a special case of multivariate linear mixed model [17], where the within-city (co)variance is assumed known. Among alternative estimation methods, such as Bayesian [18] and multivariate extensions of the method of moments [19], we privilege here likelihood-based approaches [20, 21]. Methods to derive tests and confidence intervals, fit statistics and (best-linear unbiased) predictions have been previously developed within the linear mixed models framework for the application in this setting, together with a description of the software implementation in the R package mvmeta[12].

#### Limitations of multivariate meta-analysis

In theory, the *m* sets of estimated first-stage coefficients {\widehat{\mathit{\eta}}}_{i} of the cross-basis obtained from DLNMs in (2) can be meta-analysed using (3), producing an population-averaged three-dimensional effect surface across locations, optionally conditional on meta-variables in multivariate meta-regression. However, as anticipated above, the definition of DLNMs in (2) requires *k* = *v*
_{
x
} × *v*
_{
ℓ
} parameters {\widehat{\mathit{\eta}}}_{i} for the cross-basis. For models specified by even moderately complex bases in each space, the number of parameters becomes so high that the optimization routine for multivariate meta-analysis is computationally impractical. This is particularly relevant for the (co)variance terms in *ξ* defining the true between-location variability, composed by *k*(*k* + 1)/2 parameters for an unstructured matrix **Ψ**.

This limitation is one of the main reasons which have prevented so far the full application of DLNMs in two-stage analysis. The modelling approach has often required the simplification of the first-stage model, for the second-stage multivariate meta-analysis to be feasible. For example, investigators have assumed a linear relationship in the dimension of the predictor [8–11], or computed a simple exposure moving average for the lag space [6–8]. We previously adopted the same limited approach [12]. The development of methods to derive meta-analytical estimates from full DLNMs would offer a great deal of flexibility in the investigation of complex exposure-response dependencies.

### Reducing DLNMs

Predictions from DLNMs as those shown in Figure 1 are obtained by selecting the grid of exposure and lag values defined as **x**
_{[p]} and *ℓ*
_{[p]}, respectively. Details on the algebraic development are given elsewhere [5] [sections 4.2–4.3]. Briefly, predictions are computed by deriving matrices **Z**
_{[p]} and **C**
_{[p]} from **x**
_{[p]} and *ℓ*
_{[p]}, respectively, through the application of the same basis functions used for estimation in (2). The bi-dimensional predicted relationship in location *i* is expressed by the full set of estimated parameters in {\widehat{\mathit{\eta}}}_{i} and quantities derived by **Z**
_{[p]} and **C**
_{[p]}. However, the specific summaries described in the previous section are defined only on the single dimension of predictor (lag-specific and overall cumulative associations, bottom panels of Figure 1) or lag (predictor-specific association, top-right panel of Figure 1). The idea is to re-parameterize these summaries in terms of the related uni-dimensional basis **Z**
_{[p]} or **C**
_{[p]} for predictor or lags only, respectively, and sets of reduced in coefficients {\widehat{\mathit{\theta}}}_{i}. Dimensionality of the functions expressing such summaries therefore decreases from *v*
_{
x
} × *v*
_{
ℓ
}, corresponding to the length of vector {\widehat{\mathit{\eta}}}_{i} in the original parameterization, to *v*
_{
x
} or *v*
_{
ℓ
} only, corresponding to the length of new sets of reduced parameters {\widehat{\mathit{\theta}}}_{i}.

The definition of the reduced parameters depends on the specific summary among those listed above. They can be obtained by applying a related dimension-reducing matrix **M**, expressed as:

\begin{array}{l}{\mathbf{M}}_{\left[{x}_{0}\right]}={\mathbf{I}}_{\left({v}_{\ell}\right)}\otimes {\mathbf{z}}_{\left[{x}_{0}\right]}^{T}\phantom{\rule{2em}{0ex}}\end{array}

(4a)

\begin{array}{l}{\mathbf{M}}_{\left[{\ell}_{0}\right]}={\mathbf{c}}_{\left[{\ell}_{0}\right]}^{T}\otimes {\mathbf{I}}_{\left({v}_{x}\right)}\phantom{\rule{2em}{0ex}}\end{array}

(4b)

\begin{array}{l}{\mathbf{M}}_{\left[c\right]}={\mathbf{1}}_{(L+1)}^{T}\mathbf{C}\otimes {\mathbf{I}}_{\left({v}_{x}\right)}\phantom{\rule{2em}{0ex}}\end{array}

(4c)

for predictor-specific summary association at *x*
_{0}, for for lag-specific summary association at *ℓ*
_{0}, and for overall cumulative summary association, respectively. Here {\mathbf{z}}_{\left[{x}_{0}\right]} and {\mathbf{c}}_{\left[{\ell}_{0}\right]} are the vectors of transformed values of *x*
_{0}and *ℓ*
_{0} obtained by the application of the sets of basis functions for predictor and lags, respectively. The reduced parameter vector {\widehat{\mathit{\theta}}}_{i} and associated (co)variance matrix V\left({\widehat{\mathit{\theta}}}_{i}\right) are then obtained by:

\begin{array}{c}{\widehat{\mathit{\theta}}}_{\left[j\right]i}={\mathbf{M}}_{\left[j\right]}{\widehat{\mathit{\eta}}}_{i}\hfill \\ \phantom{\rule{.8em}{0ex}}V\left({\widehat{\mathit{\theta}}}_{\left[j\right]i}\right)\hfill & ={\mathbf{M}}_{\left[j\right]}V\left({\widehat{\mathit{\eta}}}_{i}\right){\mathbf{M}}_{\left[j\right]}^{T}\text{,}\hfill \end{array}

(5)

with *j* ∈ {*x*
_{0},*ℓ*
_{0},*c*}. The predictor-specific association at *x*
_{0}, defined on the lag space for values in *ℓ*
_{[p]}, is expressed by {\mathbf{C}}_{\left[p\right]}{\widehat{\mathit{\theta}}}_{\left[{x}_{0}\right]i}, with standard errors provided by the square root of {\mathbf{C}}_{\left[p\right]}V\left({\widehat{\mathit{\theta}}}_{\left[{x}_{0}\right]i}\right){\mathbf{C}}_{\left[p\right]}^{T}. The lag-specific association at *ℓ*
_{0} and the overall cumulative association, defined on the predictor dimension for values in **x**
_{[p]i
}, are then obtained by {\mathbf{Z}}_{\left[p\right]}{\widehat{\mathit{\theta}}}_{\left[{\ell}_{0}\right]i} and {\mathbf{Z}}_{\left[p\right]}{\widehat{\mathit{\theta}}}_{\left[c\right]i}, respectively, with computation for standard errors as above. The number of coefficients defining these summaries, reduced from *v*
_{
x
} × *v*
_{
ℓ
} to *v*
_{
ℓ
} or *v*
_{
x
} only, is usually compatible with the application of standard multivariate meta-analysis techniques, which can now be used to combine the estimates of these summaries from DLNMs in two-stage analyses. In addition, the derivation in (4)–(5) simplifies the algebra for DLNM predictions originally provided [5] [Section 4.3]. The dimension reduction comes at the price of loss of information about the association on one of the two dimensions, as the rank-deficiency of **M** does not allow reversing the reduction applied in (5).