Measurement error in time-series analysis: a simulation study comparing modelled and monitored data

Background Assessing health effects from background exposure to air pollution is often hampered by the sparseness of pollution monitoring networks. However, regional atmospheric chemistry-transport models (CTMs) can provide pollution data with national coverage at fine geographical and temporal resolution. We used statistical simulation to compare the impact on epidemiological time-series analysis of additive measurement error in sparse monitor data as opposed to geographically and temporally complete model data. Methods Statistical simulations were based on a theoretical area of 4 regions each consisting of twenty-five 5 km × 5 km grid-squares. In the context of a 3-year Poisson regression time-series analysis of the association between mortality and a single pollutant, we compared the error impact of using daily grid-specific model data as opposed to daily regional average monitor data. We investigated how this comparison was affected if we changed the number of grids per region containing a monitor. To inform simulations, estimates (e.g. of pollutant means) were obtained from observed monitor data for 2003–2006 for national network sites across the UK and corresponding model data that were generated by the EMEP-WRF CTM. Average within-site correlations between observed monitor and model data were 0.73 and 0.76 for rural and urban daily maximum 8-hour ozone respectively, and 0.67 and 0.61 for rural and urban loge(daily 1-hour maximum NO2). Results When regional averages were based on 5 or 10 monitors per region, health effect estimates exhibited little bias. However, with only 1 monitor per region, the regression coefficient in our time-series analysis was attenuated by an estimated 6% for urban background ozone, 13% for rural ozone, 29% for urban background loge(NO2) and 38% for rural loge(NO2). For grid-specific model data the corresponding figures were 19%, 22%, 54% and 44% respectively, i.e. similar for rural loge(NO2) but more marked for urban loge(NO2). Conclusion Even if correlations between model and monitor data appear reasonably strong, additive classical measurement error in model data may lead to appreciable bias in health effect estimates. As process-based air pollution models become more widely used in epidemiological time-series analysis, assessments of error impact that include statistical simulation may be useful.


Background
Bias in estimation due to measurement error has received much attention in medical research including epidemiology. In its simplest form i.e. pure additive classical measurement error, the relationship between the observed variable or surrogate measure Z and the "true" variable X * can be expressed as: It is well documented that replacing X * by Z as the explanatory variable in a simple linear regression analysis leads to attenuation in the estimation of both the Pearson correlation coefficient and the gradient of the regression line with the extent of the attenuation depending on the reliability ratio ρ ZX* where ρ ZX * = var(X * )/var(Z) [1]. Similarly in simple Poisson regression pure additive classical error in the explanatory variable leads to attenuation in the estimation of the relative risk [2].
However, not all measurement error is classical. Reeves et al. [3] considered the impact of measurement error in a situation where individual radon exposure was measured with additive classical error but where subjects with missing radon data were assigned an area average. If the variability of "true" individual radon exposure is the same within each area and the area averages are exact (i.e. measured without error) their use as surrogate measures introduces pure additive Berkson error. This type of measurement error has no biasing effect on the regression coefficient in simple linear regression [4] and little if any such effect on the regression coefficient in simple Poisson regression [2,5]. However if the averages are not exact they introduce a combination of Berkson error and classical error and the presence of additive classical error biases the gradient estimate or relative risk estimate towards the null.
The consequences of using an area average as a surrogate explanatory variable has been investigated in simulations by Lee et al. [6]. Based on a time-series analysis of daily mortality counts and average daily air pollution (average of readings from available monitors in the study region), they found that increasing the probability of siting monitors in high pollution areas led to attenuation in health effect estimates and poor coverage intervals. They also found that within a separate scenario of high classical instrument error (assumed to be additive on a log scale) and low spatial variation, reducing the total number of monitors in the study region from 30 to 5 enhanced any attenuation in the health effect estimate.
As indicated above, in some circumstances measurement error is proportional (i.e. additive on a log scale) and the relationship of interest is with the untransformed explanatory variable. In the context of using over-dispersed Poisson regression to investigate the effects of air pollution on daily emergency department visits, a recent simulation study by Goldman et al. [5] concluded that while pure proportional classical error in the daily air pollution data led to an attenuated estimate of relative risk, pure proportional Berkson error in the pollution data actually led to an inflated estimate of relative risk, i.e. bias away from the null. This is in line with findings for logistic regression from the simulation study of Steenland et al. [7] which suggested that if the Berkson error variance increases as values of the surrogate measure increase, bias in the regression coefficient away from the null may result.
For statistical models containing more than one explanatory variable, the effect of measurement error depends not only on the error type (i.e. Berkson, classical, proportional, additive) but also on the correlation between the explanatory variables, which explanatory variables are causal and which are measured with error. In Poisson regression with two explanatory variables, one causal and measured with pure additive classical error and the other non-causal and measured without error, Fung et al. [8] demonstrated through simulation that the estimated relative risk of the causal variable will be attenuated and that if the correlation between the two explanatory variables is high (i.e. multicollinearity) the predictive effect of the causal variable may be transferred to the non-causal variable.
In air pollution epidemiology short-term associations between outdoor air pollution and health are assessed using an ecological time-series design. Many such studies have been published [9] and inform public health policy [10]. These studies correlate daily counts of health events in a specific location (usually a city) with daily pollution concentrations derived from static monitoring sites. Regional air pollution chemistry-transport models (CTMs) that are capable of simulating hourly and daily concentrations of a wide range of pollutants at fine-scale resolutions (i.e. ≤ 10 km) have recently been developed. These provide new opportunities to investigate pollution metrics (e.g. individual particulate matter components or source-resolved pollutant metrics) which either cannot be currently measured or can be measured only at a limited number of locations due to their measurement complexity and/or a sparse monitoring network. In this paper we compare, using statistical simulation methods, the performance of geographically resolved model data at 5 km × 5 km resolution with area-wide average concentrations derived from a number of air pollution monitors.
We compare performance in terms of additive and proportional measurement error and its effect on a time-series analysis of the relationship between daily ambient background pollution levels and daily counts of a health event (using all-cause mortality as an example) at the small area level, i.e. the 5 km × 5 km grid. The analysis is conducted using Poisson regression. Measures from background air pollution monitors may be available for some 5 km × 5 km grids but not for others. Our primary aim is therefore to demonstrate how simulation techniques can be employed to investigate when and if it might be better to use data from a CTM rather than as often happens in practice, aggregating over grids and using average pollution values based on monitor data. Our simulations are based on a theoretical study area divided into 4 regions each consisting of twenty-five 5 km × 5 km grids and within this construct we consider the effect of varying the number of grids per region containing a monitor. The parameter estimates used in our simulations are taken from observed monitor versus model comparisons.
For the purposes of this investigation we assume that it is the association of ambient pollution with mortality at the small-area level that is important (because of the link to regulation, [11]) rather than exposure at the level of the individual and leave consideration of disparities between background monitoring networks and personal exposure to others [4,11,12]. There is also a literature on impacts of measurement error in air pollution for study designs other than time-series [13,14].

Simulating a "true" time-series
Simulations were performed using DRAWNORM in STATA 10 [15] and relate to a theoretical square study area measuring 50 km North by 50 km East which can be divided into 4 regions each consisting of twenty-five 5 km × 5 km grid-squares. We assume that: For a given pollutant its "true" background concentration (i.e. devoid of bias or measurement error) in grid-square i on day t of a 3-year time-series is: For each grid i, the "true" 3-year time-series, represented by the vector X Ã i , exhibits no trend or seasonal variation. Grid-specific means μ i (i = 1, …, 100) are Normally distributed around an overall mean μ with variance σ 2 b .
Each row of the 1095 × 100 time-series matrix, , consists of a sample drawn from a Multivariate Normal distribution, MVN(U, Ω), with grid-specific means, μ i (i = 1,…,100) , common within-grid variance σ 2 w , between-grid covariances σ i,k (i = 1,…,100;k = 1,…,100) and between-grid correlation coefficients ρ i,k (i = 1,…,100;k = 1,…,100) , such that: ; For each grid i the number of deaths on day t, y i,t , is sampled from a Poisson distribution with mean dependent on the "true" background concentration of the pollutant in that grid on that day, according to the following formula: We consider two pollutant metrics, daily maximum 8hour ozone and log e (daily 1-hour maximum NO 2 ). NO 2 concentrations are log transformed to take account of a positively skewed distribution.

Simulating observed monitor data
Pollution concentrations obtained from monitors will include measurement error due to instrument imprecision and monitor location. Given the small size of grids (i.e. 5 km × 5 km) and that instrument error for an unbiased monitor is generally considered to be classical [16], for each grid i we simulate a 3-year time-series of monitor data, X i , by adding classical measurement error to our "true" time-series X Ã i as follows: where for each element ε i,t of the error vector E i ε i;t e N 0; σ 2

Simulating model data
For each grid i we simulate a 3-year time-series of model data, Z i , from X Ã i . However in contrast to the above we allow for a grid-specific bias (i.e. E X Ã where μ i and c i are grid-specific constants) and for the presence of Berkson-like error as well as classical-like error (i.e. we allow for the possibility that, We do this by using the approach of Reeves et al. [3]. This approach exploits the fact that if we express Z i as a linear function of X Ã i then using standard theory as outlined in Cox and Hinkley [17]: Þ then with the exception of the grid-specific bias term (c i ) formula 1.2 reduces to a classical error model.
In populating 1.2, we assume that model data are uncorrelated with instrument and location error (i.e. cov(Ε i , Z i ) = 0). From this it follows that cov X i ; . In addition, provided our focus is on the effects of additive measurement error, (not the case for proportional measurement error), and our time-series analysis adjusts for grid, we can simplify calculations by setting the grid-specific constant terms c i = c for all i = 1, …, 100.
For the purposes of our simulations involving proportional error we ignore any dependence between and E X Ã i À Á and assume that: Simulating regional averages We simulate the use of regional averages in situations where pollution monitor coverage is less than 1 monitor per 5 km × 5 km grid by first sampling a sub-set of l grid-squares (R jl (j = 1, …, 4)) from each of the 4 regional sets of 25 grids-squares (R j (j = 1, …, 4)) such that R jl ⊂ R j . Next we replace each 3-year time-series, X i (i ϵ R j ) with a 3-year time-series of averages W j based on the formula: Simulated regional average time-series are produced in this way for l = 5, l = 10, l = 15, l = 20, l = 25.
We also consider the single monitor scenario i.e. l = 1.

Comparison of observed monitor and CTM data
Realistic estimates for the above as yet unset parameters (e.g. σ 2 b ; var Z i ð Þ) were obtained by reference to observed monitor and chemistry-transport model (CTM) data. The monitor data came from the UK's Automatic Urban and Rural Network (AURN) and were obtained via the UK national air information resource [18].
The modelled data used were daily outputs from the EMEP-WRF v3.7 grid-based (Eulerian) 3-D CTM which provides a detailed simulation of the evolving physical and chemical state of the atmosphere over the UK. The underlying CTM is the EMEP Unified Model [19] which has been modified to enable application at 5 km horizontal spatial resolution over the British Isles [20]. A nested approach is used whereby EMEP simulations of atmospheric composition across a coarser European domain are used to drive fine-scale EMEP-WRF simulations of air quality at 5 km horizontal resolution across the UK. The EMEP and EMEP-WRF models have been extensively validated and used for numerous policy applications [21,22].
Daily concentrations of monitored ozone (μg/m 3 ) and their corresponding EMEP-WRF CTM estimates, covering a total of at least 364 days over the period 2003-2006, were obtained for 35 urban background and 21 rural monitoring sites across England, Wales, Scotland and Northern Ireland. Similarly paired daily concentrations of NO 2 (μg/m 3 ), again covering at least 364 days over the period 2003-2006, were obtained for 43 urban background and 14 rural monitoring sites across, England, Wales, Scotland and Northern Ireland. Ozone concentrations were daily maximum running 8-hour mean and NO 2 concentrations were log e -transformed (daily 1-hour maximum). Summary statistics comparing monitor and CTM data for rural and urban sites are presented in Table 1.
The distance between each pair of monitoring sites of the same type was calculated. Then having first standardised monitored pollution concentrations within site by subtracting the site mean and dividing by the site standard deviation, Pearson correlations across time between site pairs were calculated for rural ozone, urban background ozone, rural log e (NO 2 ) and urban background log e (NO 2 ) and plotted against distance ( Figure 1). Correlations based on <364 paired observations were set to missing. The relationships between Pearson correlation and distance were then investigated using simple linear regression.

Parameter estimates
To simulate "true" urban background ozone concentrations for our theoretical study area we set μ = 61.73 and In calculating pollution metrics we employ the 75% rule such that a valid 8-hour mean must be based on at least 6 values, a daily maximum 8-hour mean must be based on at least 18 valid 8-hour means and a daily maximum 1-hour concentration on at least 18 hourly concentrations. (Table 1), and constructed a correlation matrix ρ(100,100) using the regression equation based on Pearson correlation as a function of distance between monitors (Figure 1(a)): Each off-diagonal element of ρ was calculated by setting D equal to an estimate of the average distance in km between any two points, one in each of the two 5 km × 5 km grid-squares being compared (using simulation: where d is the straight line distance between the centre points of the two grid-squares). The diagonal elements were calculated by setting D equal to an estimate of the average distance between any two points within a 5 km × 5 km grid-square (using simulation: D ≈ 2.6 ). The variance/covariance matrix Ω(100, 100) was then obtained by multiplying each element of ρ by the average observed within-site variance (i.e. 25.28 2 in Table 1). This produced a symmetrical matrix with diagonal elements equal to 24.35 2 , the estimated average "true" within-site variance having removed any variation due to instrument error and monitor-site location error (i.e. σ 2 err Þ. For simulating observed monitor data we set σ err = 6.77 (see Additional file 1) and for simulating model data within each grid i we set: cov X Ã i ; Z i  Table 1).
Parameter estimates for rural ozone, urban background log e (NO 2 ) and rural log e (NO 2 ) were obtained in the same fashion.

Proportional measurement error
For NO 2 we have assumed that measurement error is additive on a log scale and that the relationship of interest is with log e (NO 2 ). If, however, the relationship of interest is with NO 2 (untransformed) then measurement error in the explanatory variable is proportional rather than additive. In order to simulate monitor NO 2 data with proportional error, we first simulate log e (NO 2 ) as before but then back-transform (i.e. NO 2 = exp(log e (NO 2 )) prior to calculating regional averages. For model data, we first simulate log e (NO 2 ) as in Equation (1.2) but instead of setting the c i = c, we use Equation (1.2a) and set σ diff = 0.268 for urban background log e (NO 2 ) and σ diff = 0.210 for rural log e (NO 2 ) (see Table 1). The data are then back-transformed. With NO 2 rather than log e (NO 2 ) as the explanatory variable in our epidemiological time-series analysis we set: α = 0.32 and β = 0.0003992 (i. e. e β × 10 = 1.0040 indicating a 0.4% increase in mortality per 10 μg/m 3 increase in NO 2 ).

Statistical analysis of simulated time series
For each of the 7 time-series scenarios considered in each of Tables 2, 3 and 4, 1000 simulated data sets were produced and each analysed separately using Poisson regression with grid as a fixed effect. As a result, 1000 separate estimates of both the health effect (β ) and its standard error, SEβ , were obtained. Statistics presented in Tables 2, 3 and 4 include estimate averages and estimates of the coverage probability and power. An estimate of coverage probability records the percentage of simulations where the 95% confidence interval contains the "true" value of β and an estimate of power records the percentage of simulations that would have detected the health effect estimate as statistically significant at the 5% significance level.
Finally using established theory (See Additional file 2) we obtained predictions of the attenuation in β that we might expect from using CTM data or data from a single monitor per region. These predictions were then compared to the corresponding results obtained from our simulations.

Error decomposition
In order to aid interpretation of our simulation results for the CTM data, we decomposed the grid-specific error variance var Z i −X Ã i À Á into two components, a classical-like The table presents estimated regression coefficientsβ , standard errors SEβ , coverage probabilities and power, each based on the analysis of 1000 sets of simulated time-series data. The "true" value of the regression coefficient β for ozone (i.e. β × 10 = 0.00399) equates to a 0.4% increase in mortality per 10 μg/m 3 increase in ozone and the "true" value of the regression coefficient for log e (NO 2 ) (i.e. β = 0.0419) equates to a 0.4% increase in mortality per 10% increase in NO 2 . The table presents estimated regression coefficientsβ , standard errors SEβ , coverage probabilities and power, each based on the analysis of 1000 sets of simulated time-series data. The "true" value of the regression coefficient β for ozone (i.e. β × 10 = 0.00399) equates to a 0.4% increase in mortality per 10 μg/m 3 increase in ozone and the "true" value of the regression coefficient for log e (NO 2 ) (i.e. β = 0.0419) equates to a 0.4% increase in mortality per 10% increase in NO 2 .

component (CC), and a Berkson-like component (BC) as follows:
Estimates of CC and BC were then obtained using the observed data (See Additional file 3 for further details and calculations).

Results
Comparing "true" values of the regression coefficient, β, (e.g. β × 10 = 0.00399 for urban background ozone) with those based on simulated data,β , Tables 2 and 3 suggest that the use of regional average monitor data as a surrogate for grid-specific "true" ambient concentrations has limited impact on health effect estimates unless the number of monitors per 25 km × 25 km grid-square falls below 3 (or possibly 5 in the case of rural log e (NO 2 )). The monitoring scenario which produced the largest bias in the health effect for all four pollutants was that of a single monitor per 25 km × 25 km grid-square. The regression coefficient was attenuated by an estimated 6% for urban ozone, 13% for rural ozone, 29% for urban log e (NO 2 ) and 38% for rural log e (NO 2 ). By contrast when we used grid-specific model data, the regression coefficient was attenuated by an estimated 19% for urban ozone, 22% for rural ozone, 54% for urban log e (NO 2 ) and 44% for rural log e (NO 2 ). Thus, although for rural log e (NO 2 ) results were similar to those of the 1 monitor per region scenario, for urban and rural ozone, urban log e (NO 2 ) and for less sparse monitoring networks the use of model rather than monitor data appeared to produce a more marked level of bias in the health effect estimate. Comparison of the "true" values of the regression coefficient with those based on simulated "true" data (Tables 2 and 3) suggests that our findings are not simply due to an inadequate number of simulations. Of particular note are the small coverage probabilities for log e (NO 2 ), especially when using the grid-specific model data, but also evident when using measured rural data from a single monitor within each 25 km × 25 km grid. These suggest that not only is there marked attenuation in the health effect estimate but that bias extends to the standard errors, such that few simulations produced a 95% confidence interval containing the "true" value of β (only 15% for urban background modelled log e (NO 2 ) and 11% for rural modelled log e (NO 2 ) (Tables 2 and 3). As expected statistical power for log e (NO 2 ) is consistently higher than for ozone as the magnitude of the "true" effect to be detected is larger (i.e. a 0.4% increase in mortality per 10% increase in NO 2 versus a 0.4% increase in mortality per 10 μg/m 3 in ozone). Nevertheless, the use of grid-specific model data for urban and rural ozone and the use of either model or 1 monitor per region data for urban log e (NO 2 ) appears to have a slightly adverse effect on power. Table 4 presents results for NO 2 assuming proportional measurement error (i.e. additive on a log scale) but where the relationship of interest is with the untransformed variable. Overall, compared to log e (NO 2 ), power-loss due to measurement error was similar but coverage probabilities, particularly for model data, improved. Model data and the The table presents estimated regression coefficientsβ , standard errors SEβ , coverage probabilities and power, each based on the analysis of 1000 sets of simulated time-series data. The "true" value of the regression coefficient β for NO 2 (i.e. β × 10 = 0.00399) equates to a 0.4% increase in mortality per 10 μg/m 3 increase in NO 2 .
single monitor scenario registered the largest attenuation in the regression coefficient, but there was noticeable attenuation even when using regional averages based on 5 monitors per 25 km × 25 km region.

Predictions from theory
For model data and for the 1 monitor scenario, established theory (see Additional file 2) allows us to predict the effects of additive measurement error on the health effect estimate. Table 5 illustrates that estimates of attenuation in β obtained by simulation are not that dissimilar from those obtained using standard theory in this simple case.

Discussion
In the context of a time-series analysis of the association between daily concentration of air pollution and mortality, our study used simulation as a technique to contrast the effects on the estimation of that association of using grid-specific pollution data derived from a 3-D chemistry-transport model as opposed to regional average air pollution concentrations derived from monitors. Pollution concentrations were simulated both with (i.e. monitor data), and without (i.e. "true" data) classical "instrument and monitor-location" error. The "true" data were then used in the statistical simulation of model data with the inclusion of both classical and Berkson-like error. The parameter estimates driving our simulations were based on both monitor and CTM daily maximum 8-hour mean ozone data for 35 urban background and 21 rural monitoring sites across the UK and on both monitor and CTM log e (daily maximum 1-hour NO 2 ) data for 43 urban background and 14 rural monitoring sites across the UK. Within-grid correlations between observed monitor and CTM data were relatively strong with average correlation coefficients of 0.73 for rural ozone, 0.76 for urban ozone, 0.67 for rural log e (NO 2 ) and 0.61 for urban log e (NO 2 ). The lower correlations for log e (NO 2 ) were likely a consequence of the shorter averaging time of the NO 2 metric (i.e. 1-hour rather than 8-hour for ozone).
For both pollutants (i.e. ozone and log e (NO 2 )), the use of a single monitor to provide estimated pollution concentrations for every 5 km × 5 km grid within a 25 km × 25 km region produced attenuated health effect estimates. This attenuation was less marked for the more spatially homogeneous long-lived pollutant ozone, for which the short distance correlations in Figure 1 were strong, than for the short-lived pollutant log e (NO 2 ) for which the short distance correlations were considerably weaker. However for other scenarios, particularly those based on 5 or 10 monitors, the use of regional averages with additive rather than proportional error had little effect on health effect estimates. This concurs with the simulation findings of Sheppard et al. [12] who reported a "small but noticeable" attenuation in the heath effect estimate when ambient area exposure to PM 2.5 was based on a single pollution monitor, but little if any attenuation when area exposure was based on the average across 3 or 10 monitors.
Goldman et al. [16] recognized that a large proportion of the measurement error introduced by the use of average monitor concentrations is due to spatial variation and suggests that such error is predominantly Berkson, which, while reducing statistical power, will not on its own lead to bias in health effect estimates. However as classical error is introduced, occurring as we introduce instrument error and monitor-site location error into our simulations and reduce the number of monitors on which averages are based, attenuation in the health effect estimate is observed. This is more pronounced for log e (NO 2 ), particularly rural log e (NO 2 ) than for ozone. This suggests, in line with the findings of others, that attenuation of the relative risk depends not only on instrument error but on the number and placement of monitors [6,16,23] and on the level of spatial variation [6,23,24]. As suggested by Goldman et al. [16], it may be the combination of these sources which determine the ultimate effect on relative risk estimates.
The combined effects of different error sources may also help to explain why contrary to expectation we found no evidence in Tables 2 and 3 (i.e. additive measurement error) For model data we base our predictions on the average observed within-site covariance rather than the average observed within-site correlation.
of any reduction in statistical power from the use of regional average monitor data based on 2, 3, 5 or 10 monitors per region, with any loss of power most noticeable for the 1 monitor scenario in particular in relation to urban log e (NO 2 ).
The use of simulated model data produced attenuation in the health effect estimate, which for rural log e (NO 2 ) was similar to that associated with the scenario of a single regional monitor. However for urban and rural ozone and particularly urban log e (NO 2 ) regression coefficients were more biased towards the null than for the single monitor case. According to Sheppard et al. [25] classical error can result not only in an attenuated health effect estimate but also lead to a downward bias in the estimation of standard errors and thus to inaccuracy in the coverage of 95% confidence intervals. The appreciable bias in health effect estimates and coverage intervals based on simulated model data for log e (NO 2 ) therefore implies the presence of predominantly classical rather than Berkson-like error in EMEP-WRF CTM estimates of this pollution metric. In order to investigate this further we attempted using our comparison dataset to decompose random measurement error into its classical-like and Berkson-like components (Additional file 3). Our results suggested that indeed classical error predominates overwhelmingly in the log e (NO 2 ) CTM data.
The use of NO 2 rather than log e (NO 2 ) (i.e. proportional rather than additive measurement error) appeared to lead to a marked improvement in the previously poor coverage probabilities of the model data but further attenuation in health effect estimates based on regional averages. However these regional averages still tended to outperform model data with the possible exception of the 1 monitor per 25 km × 25 km grid square scenario for rural NO 2 where monitor and model findings were comparable. Unlike additive measurement error whose biasing effect on grid means is effectively adjusted for by including grid as a fixed effect in our time-series analyses, this is not the case when measurement error is proportional. For model data with proportional error therefore it is important to note that our findings may depend to some extent on grid-specific mean pollution levels and the validity of the assumptions we make in simulating them (see Equation 1.2a).
One of the strengths of our simulation approach is that it allows the correlation between time-series in different grids to vary according to the distance between those grids. However, in so doing we make the assumption that spatial dependence is characterised by a single linear function. In our regression analysis of the association between correlation and distance ( Figure 1) the addition of a quadratic term was statistically significant for urban and rural ozone and for urban log e (NO 2 ), although for all three pollutants the incorporation of this non-linearity had a relatively small impact on the percentage of variance explained (explaining an additional 0.2, 1.6 and 1.6 percentage points respectively). We also assume that spatial dependence is independent of direction (i.e. isotropic) and geography (other than a distinction between urban and rural) and does not vary over time. This may not be the case if the study area contains point sources, the outflow from which may vary in direction, with direction varying itself over time due to changing weather conditions. Nevertheless this is an assumption employed by other authors [5,23] in this field, possibly due to the fact that data sufficient to incorporate such features into simulation studies is not readily available or generalizable.
Our simulations allow mean pollution concentrations to vary between grids although we assume that they vary at random and do not take account of the fact that mean pollution concentrations in nearest neighbour grids may be more similar than those at a distance. This could have implications for our results involving proportional measurement error. However, when for each pair of monitoring stations in our observed monitor data set we plotted the absolute difference in site means against distance there was no evidence of a linear relationship whether for log e (NO 2 ) or ozone, urban or rural. Though in some ways reassuring, these findings may nevertheless be insensitive to differences in grid-mean pollution concentrations within urban areas, where for example background levels of NO 2 tend to increase as one approaches the urban core [26], whilst background levels of O 3 tend to decrease.
A further limitation is that we use the same variance to generate each within-grid time-series and that timeseries, both modelled and monitored, are simulated without seasonal pattern or trend. Hence we do not consider the influence of time-dependent confounding variables nor other confounders or pollutants. However the effects of measurement error in multi-pollutant models [4,27] and in the presence of confounders have been considered by others [25,28].
Although quantitatively the simulation parameters we used (and hence our results) only apply to the EMEP-WRF model v3.7 for the British Isles, the simulation approach is generalizable and may be used in the evaluation of other chemistry-transport models in other areas.
Eulerian CTMs similar to the EMEP model discretize the real world using a fixed horizontal and vertical grid with no explicit information of within-grid variability of emissions. Linear emissions such as roads and/or point sources are averaged to the CTM horizontal resolution. This approximation may limit the model ability to resolve the near sources chemistry and transport which is likely to occur near urban monitor sites. Moreover, the EMEP model was not designed to replicate the complex urban environment. Local dispersion models which can represent the fine-scale complexity of an urban environment are currently available (ADMS, ERG models), however they are very computationally expensive and are limited to specific areas and rely on CTMs for boundary condition in order to capture the regional import/export of pollutants.
The benefit of full temporal and UK coverage and the self-consistency of predicted chemicals parameters should not be underestimated, and perhaps this benefit overcomes the shortage of properly representing the surface urban chemistry.
Our present findings suggest that there may be an appreciable penalty of using CTM data in spatiallyresolved epidemiological time-series studies, which for some pollutants in part weighs against the substantial benefits of such modelled data. These advantages include the opportunity to investigate pollutants (e.g. different particle measures) with sparse or zero monitor coverage, or pollutants from specific sources with direct relevance to policy formulation and evaluation, or the potential consequences from alternative future scenarios. For the simulations incorporating additive measurement error (Tables 2 and 3) and the input data used in this work, we found that monitor data out-performed model data in urban areas and in areas with at least 2 monitors per 25 km × 25 km grid-square but that the performance of monitor and model data for log e (NO 2 ), at least in terms of power and attenuation in the regression coefficient, was similar in rural areas with only 1 monitor per 25 km × 25 km grid-square. However, it is important to be clear that the impact of 'measurement' error as assessed in this paper is only one aspect of data performance relevant to the use of modelled versus monitored data in epidemiological studies, and that monitored data themselves, typically characterised by sparse data from preferential (similar type) locations with measurement errors and often missing values, also have their limitations which are often ignored. High resolution CTMs are continually being developed and our study suggests that further assessment of model error impact -which includes statistical simulationas well as improved understanding of the performance of monitored data, would be useful.