Skip to main content

Measurement error in time-series analysis: a simulation study comparing modelled and monitored data



Assessing health effects from background exposure to air pollution is often hampered by the sparseness of pollution monitoring networks. However, regional atmospheric chemistry-transport models (CTMs) can provide pollution data with national coverage at fine geographical and temporal resolution. We used statistical simulation to compare the impact on epidemiological time-series analysis of additive measurement error in sparse monitor data as opposed to geographically and temporally complete model data.


Statistical simulations were based on a theoretical area of 4 regions each consisting of twenty-five 5 km × 5 km grid-squares. In the context of a 3-year Poisson regression time-series analysis of the association between mortality and a single pollutant, we compared the error impact of using daily grid-specific model data as opposed to daily regional average monitor data. We investigated how this comparison was affected if we changed the number of grids per region containing a monitor. To inform simulations, estimates (e.g. of pollutant means) were obtained from observed monitor data for 2003–2006 for national network sites across the UK and corresponding model data that were generated by the EMEP-WRF CTM. Average within-site correlations between observed monitor and model data were 0.73 and 0.76 for rural and urban daily maximum 8-hour ozone respectively, and 0.67 and 0.61 for rural and urban loge(daily 1-hour maximum NO2).


When regional averages were based on 5 or 10 monitors per region, health effect estimates exhibited little bias. However, with only 1 monitor per region, the regression coefficient in our time-series analysis was attenuated by an estimated 6% for urban background ozone, 13% for rural ozone, 29% for urban background loge(NO2) and 38% for rural loge(NO2). For grid-specific model data the corresponding figures were 19%, 22%, 54% and 44% respectively, i.e. similar for rural loge(NO2) but more marked for urban loge(NO2).


Even if correlations between model and monitor data appear reasonably strong, additive classical measurement error in model data may lead to appreciable bias in health effect estimates. As process-based air pollution models become more widely used in epidemiological time-series analysis, assessments of error impact that include statistical simulation may be useful.

Peer Review reports


Bias in estimation due to measurement error has received much attention in medical research including epidemiology. In its simplest form i.e. pure additive classical measurement error, the relationship between the observed variable or surrogate measure Z and the “true” variable X * can be expressed as:

Z = X * + γ
γ Ν 0 , σ d 2 , cov X * , γ = 0 , E Z = E X

It is well documented that replacing X * by Z as the explanatory variable in a simple linear regression analysis leads to attenuation in the estimation of both the Pearson correlation coefficient and the gradient of the regression line with the extent of the attenuation depending on the reliability ratio ρ ZX* where ρ ZX * = var(X )/var(Z) [1]. Similarly in simple Poisson regression pure additive classical error in the explanatory variable leads to attenuation in the estimation of the relative risk [2].

However, not all measurement error is classical. Reeves et al. [3] considered the impact of measurement error in a situation where individual radon exposure was measured with additive classical error but where subjects with missing radon data were assigned an area average. If the variability of “true” individual radon exposure is the same within each area and the area averages are exact (i.e. measured without error) their use as surrogate measures introduces pure additive Berkson error. This type of measurement error has no biasing effect on the regression coefficient in simple linear regression [4] and little if any such effect on the regression coefficient in simple Poisson regression [2, 5]. However if the averages are not exact they introduce a combination of Berkson error and classical error and the presence of additive classical error biases the gradient estimate or relative risk estimate towards the null.

The consequences of using an area average as a surrogate explanatory variable has been investigated in simulations by Lee et al. [6]. Based on a time-series analysis of daily mortality counts and average daily air pollution (average of readings from available monitors in the study region), they found that increasing the probability of siting monitors in high pollution areas led to attenuation in health effect estimates and poor coverage intervals. They also found that within a separate scenario of high classical instrument error (assumed to be additive on a log scale) and low spatial variation, reducing the total number of monitors in the study region from 30 to 5 enhanced any attenuation in the health effect estimate.

As indicated above, in some circumstances measurement error is proportional (i.e. additive on a log scale) and the relationship of interest is with the untransformed explanatory variable. In the context of using over-dispersed Poisson regression to investigate the effects of air pollution on daily emergency department visits, a recent simulation study by Goldman et al. [5] concluded that while pure proportional classical error in the daily air pollution data led to an attenuated estimate of relative risk, pure proportional Berkson error in the pollution data actually led to an inflated estimate of relative risk, i.e. bias away from the null. This is in line with findings for logistic regression from the simulation study of Steenland et al. [7] which suggested that if the Berkson error variance increases as values of the surrogate measure increase, bias in the regression coefficient away from the null may result.

For statistical models containing more than one explanatory variable, the effect of measurement error depends not only on the error type (i.e. Berkson, classical, proportional, additive) but also on the correlation between the explanatory variables, which explanatory variables are causal and which are measured with error. In Poisson regression with two explanatory variables, one causal and measured with pure additive classical error and the other non-causal and measured without error, Fung et al. [8] demonstrated through simulation that the estimated relative risk of the causal variable will be attenuated and that if the correlation between the two explanatory variables is high (i.e. multicollinearity) the predictive effect of the causal variable may be transferred to the non-causal variable.

In air pollution epidemiology short-term associations between outdoor air pollution and health are assessed using an ecological time-series design. Many such studies have been published [9] and inform public health policy [10]. These studies correlate daily counts of health events in a specific location (usually a city) with daily pollution concentrations derived from static monitoring sites. Regional air pollution chemistry-transport models (CTMs) that are capable of simulating hourly and daily concentrations of a wide range of pollutants at fine-scale resolutions (i.e. ≤ 10 km) have recently been developed. These provide new opportunities to investigate pollution metrics (e.g. individual particulate matter components or source-resolved pollutant metrics) which either cannot be currently measured or can be measured only at a limited number of locations due to their measurement complexity and/or a sparse monitoring network. In this paper we compare, using statistical simulation methods, the performance of geographically resolved model data at 5 km × 5 km resolution with area-wide average concentrations derived from a number of air pollution monitors.

We compare performance in terms of additive and proportional measurement error and its effect on a time-series analysis of the relationship between daily ambient background pollution levels and daily counts of a health event (using all-cause mortality as an example) at the small area level, i.e. the 5 km × 5 km grid. The analysis is conducted using Poisson regression. Measures from background air pollution monitors may be available for some 5 km × 5 km grids but not for others. Our primary aim is therefore to demonstrate how simulation techniques can be employed to investigate when and if it might be better to use data from a CTM rather than as often happens in practice, aggregating over grids and using average pollution values based on monitor data. Our simulations are based on a theoretical study area divided into 4 regions each consisting of twenty-five 5 km × 5 km grids and within this construct we consider the effect of varying the number of grids per region containing a monitor. The parameter estimates used in our simulations are taken from observed monitor versus model comparisons.

For the purposes of this investigation we assume that it is the association of ambient pollution with mortality at the small-area level that is important (because of the link to regulation, [11]) rather than exposure at the level of the individual and leave consideration of disparities between background monitoring networks and personal exposure to others [4, 11, 12]. There is also a literature on impacts of measurement error in air pollution for study designs other than time-series [13, 14].


Simulating a “true” time-series

Simulations were performed using DRAWNORM in STATA 10 [15] and relate to a theoretical square study area measuring 50 km North by 50 km East which can be divided into 4 regions each consisting of twenty-five 5 km × 5 km grid-squares. We assume that:

  • For a given pollutant its “true” background concentration (i.e. devoid of bias or measurement error) in grid-square i on day t of a 3-year time-series is:

    x i , t * i = 1 , , 100 ; t = 1 , , 1095
  • For each grid i, the “true” 3-year time-series, represented by the vector X i * , exhibits no trend or seasonal variation.

  • X i = x i , 1 · · · x i , 1095 *
  • Grid-specific means μ i (i = 1, …, 100) are Normally distributed around an overall mean μ with variance σ b 2 .

    μ i = μ + e i , e i N 0 , σ b 2
  • Each row of the 1095 × 100 time-series matrix, X * = X 1 * , , X 100 * , consists of a sample drawn from a Multivariate Normal distribution, MVN(U, Ω), with grid-specific means, μ i (i = 1,…,100), common within-grid variance σ w 2 , between-grid covariances σ i,k (i = 1,…,100;k = 1,…,100) and between-grid correlation coefficients ρ i,k (i = 1,…,100;k = 1,…,100), such that:

    U = μ 1 · · · μ 100 , Ω = σ w 2 · · · σ 100 , 1 · · · · · · · · · · · · · · · σ 1 , 100 · · · σ w 2
  • For each grid i the number of deaths on day t, y i,t , is sampled from a Poisson distribution with mean dependent on the “true” background concentration of the pollutant in that grid on that day, according to the following formula:

    E Y i , t = φ i , t = α × exp β × x i , t *
    Y i , t ~ Poisson φ i , t

We consider two pollutant metrics, daily maximum 8-hour ozone and loge(daily 1-hour maximum NO2). NO2 concentrations are log transformed to take account of a positively skewed distribution.

For ozone we set α = 0.32 (i.e. mean daily deaths with 0 μg/m3 ozone = 0.32) and β = 0.0003992 (i. e. e β × 10 = 1.0040). While values assigned to α and β are somewhat arbitrary, a 0.4% increase in mortality per 10 μg/m3 increase in ozone,  (i. e. β = 0.0003992), is the size of effect that might be observed in a real epidemiological study [9]. For loge(NO2), to aid the comparison of findings in tables, we set α = 0.32 (i.e. mean daily deaths with 1 μg/m3 NO2 = 0.32) and β = 0.0418845 (i. e. 1.10β = 1.0040 indicating a 0.4% increase in mortality per 10% increase in NO2).

Simulating observed monitor data

Pollution concentrations obtained from monitors will include measurement error due to instrument imprecision and monitor location. Given the small size of grids (i.e. 5 km × 5 km) and that instrument error for an unbiased monitor is generally considered to be classical [16], for each grid i we simulate a 3-year time-series of monitor data, X i , by adding classical measurement error to our “true” time-series X i * as follows:

X i = X i * + E i

where for each element ϵ i,t of the error vector E i

ϵ i , t ~ N 0 , σ err 2

such that,

E X i = E X i * = μ i

Simulating model data

For each grid i we simulate a 3-year time-series of model data, Z i , from X i * . However in contrast to the above we allow for a grid-specific bias (i.e. E X i * = μ i , E Z i = μ i + c i , where μ i and c i are grid-specific constants) and for the presence of Berkson-like error as well as classical-like error (i.e. we allow for the possibility that, cov X i * , Z i var X i * ). We do this by using the approach of Reeves et al. [3]. This approach exploits the fact that if we express Z i as a linear function of X i * then using standard theory as outlined in Cox and Hinkley [17]:

Z i = c i + μ i + cov X i * , Z i var X i * X i * μ i + Δ i


Δ i = δ i , 1 . . . δ i , 1095
δ i , t ~ N 0 , σ i , z . x * 2

, cov Δ i , X i * = 0 and σ i , z . x * 2 = var Z i cov X i * , Z i var X i * 2 var X i * .

If there is no Berkson-like error (i.e. cov X i * , Z i = var X i * ) then with the exception of the grid-specific bias term (c i ) formula 1.2 reduces to a classical error model.

In populating 1.2, we assume that model data are uncorrelated with instrument and location error (i.e. cov i , Z i ) = 0). From this it follows that cov X i , Z i = cov X i * + Ε i , Z i = cov X i * , Z i + cov Ε i , Z i = cov X i * , Z i . In addition, provided our focus is on the effects of additive measurement error, (not the case for proportional measurement error), and our time-series analysis adjusts for grid, we can simplify calculations by setting the grid-specific constant terms c i  = c for all i = 1, …, 100.

For the purposes of our simulations involving proportional error we ignore any dependence between E Z i E X i * and E X i * and assume that:

c i = c + ϵ c , ϵ c ~ N 0 , σ diff 2

Simulating regional averages

We simulate the use of regional averages in situations where pollution monitor coverage is less than 1 monitor per 5 km × 5 km grid by first sampling a sub-set of l grid-squares (R jl (j = 1, …, 4)) from each of the 4 regional sets of 25 grids-squares (R j (j = 1, …, 4)) such that  R jl R j . Next we replace each 3-year time-series,  X i (i R j ) with a 3-year time-series of averages W j based on the formula:

W j = 1 l i R jl X i j = 1 , , 4

Simulated regional average time-series are produced in this way for l = 5, l = 10, l = 15, l = 20, l = 25.

We also consider the single monitor scenario i.e. l = 1.

Comparison of observed monitor and CTM data

Realistic estimates for the above as yet unset parameters (e.g. σ b 2 , var Z i ) were obtained by reference to observed monitor and chemistry-transport model (CTM) data. The monitor data came from the UK’s Automatic Urban and Rural Network (AURN) and were obtained via the UK national air information resource [18].

The modelled data used were daily outputs from the EMEP-WRF v3.7 grid-based (Eulerian) 3-D CTM which provides a detailed simulation of the evolving physical and chemical state of the atmosphere over the UK. The underlying CTM is the EMEP Unified Model [19] which has been modified to enable application at 5 km horizontal spatial resolution over the British Isles [20]. A nested approach is used whereby EMEP simulations of atmospheric composition across a coarser European domain are used to drive fine-scale EMEP-WRF simulations of air quality at 5 km horizontal resolution across the UK. The EMEP and EMEP-WRF models have been extensively validated and used for numerous policy applications [21, 22].

Daily concentrations of monitored ozone  (μg/m3) and their corresponding EMEP-WRF CTM estimates, covering a total of at least 364 days over the period 2003–2006, were obtained for 35 urban background and 21 rural monitoring sites across England, Wales, Scotland and Northern Ireland. Similarly paired daily concentrations of NO2 (μg/m3), again covering at least 364 days over the period 2003–2006, were obtained for 43 urban background and 14 rural monitoring sites across, England, Wales, Scotland and Northern Ireland. Ozone concentrations were daily maximum running 8-hour mean and NO2 concentrations were loge-transformed (daily 1-hour maximum). Summary statistics comparing monitor and CTM data for rural and urban sites are presented in Table 1.

Table 1 Comparison of observed model and chemistry-transport model (CTM) data

The distance between each pair of monitoring sites of the same type was calculated. Then having first standardised monitored pollution concentrations within site by subtracting the site mean and dividing by the site standard deviation, Pearson correlations across time between site pairs were calculated for rural ozone, urban background ozone, rural loge(NO2) and urban background loge(NO2) and plotted against distance (Figure 1). Correlations based on <364 paired observations were set to missing. The relationships between Pearson correlation and distance were then investigated using simple linear regression.

Figure 1
figure 1

Simple linear regression analysis of Pearson correlation by distance. The figure presents results for (a) urban background ozone, (b) rural ozone, (c) urban background loge(NO2) and (d) rural loge(NO2). Each point on graphs represents the Pearson correlation (P) between daily standardised pollution concentrations measured at two distinct monitoring sites, plotted against the distance in km (D) between those sites. R-sq: estimate of the proportion of variance in Pearson correlation (P) explained by the fitted linear relationship with distance in km (D).

Parameter estimates

To simulate “true” urban background ozone concentrations for our theoretical study area we set μ = 61.73 and σ b 2 = 7.38 2 (Table 1), and constructed a correlation matrix ρ(100,100) using the regression equation based on Pearson correlation as a function of distance between monitors (Figure 1(a)):

E P = 0.93031 0.00080 × D

Each off-diagonal element of ρ was calculated by setting D equal to an estimate of the average distance in km between any two points, one in each of the two 5 km × 5 km grid-squares being compared (using simulation: D d + 2.13 × 1 d where d is the straight line distance between the centre points of the two grid-squares). The diagonal elements were calculated by setting D equal to an estimate of the average distance between any two points within a 5 km × 5 km grid-square (using simulation: D ≈ 2.6 ). The variance/covariance matrix Ω(100, 100) was then obtained by multiplying each element of ρ by the average observed within-site variance (i.e. 25.282 in Table 1). This produced a symmetrical matrix with diagonal elements equal to 24.352, the estimated average “true” within-site variance having removed any variation due to instrument error and monitor-site location error (i.e. σ err 2 ) .

For simulating observed monitor data we set  σ err  = 6.77 (see Additional file 1) and for simulating model data within each grid i we set: cov X i * , Z i = 455.78 , var X i * = 24.35 2 , var(Z i ) = 23.412, and c = 10.25 (see Table 1). Parameter estimates for rural ozone, urban background loge(NO2) and rural loge(NO2) were obtained in the same fashion.

Proportional measurement error

For NO2 we have assumed that measurement error is additive on a log scale and that the relationship of interest is with loge(NO2). If, however, the relationship of interest is with NO2 (untransformed) then measurement error in the explanatory variable is proportional rather than additive. In order to simulate monitor NO2 data with proportional error, we first simulate loge(NO2) as before but then back-transform (i.e. NO2 = exp(loge(NO2)) prior to calculating regional averages. For model data, we first simulate loge(NO2) as in Equation (1.2) but instead of setting the  c i  = c, we use Equation (1.2a) and set σ diff  = 0.268 for urban background loge(NO2) and σ diff  = 0.210 for rural loge(NO2) (see Table 1). The data are then back-transformed. With NO2 rather than loge(NO2) as the explanatory variable in our epidemiological time-series analysis we set: α = 0.32 and β = 0.0003992 (i. e. e β × 10 = 1.0040 indicating a 0.4% increase in mortality per 10 μg/m3 increase in NO2).

Statistical analysis of simulated time series

For each of the 7 time-series scenarios considered in each of Tables 2, 3 and 4, 1000 simulated data sets were produced and each analysed separately using Poisson regression with grid as a fixed effect. As a result, 1000 separate estimates of both the health effect ( β ^ ) and its standard error, SE β ^ , were obtained. Statistics presented in Tables 2, 3 and 4 include estimate averages and estimates of the coverage probability and power. An estimate of coverage probability records the percentage of simulations where the 95% confidence interval contains the “true” value of β and an estimate of power records the percentage of simulations that would have detected the health effect estimate as statistically significant at the 5% significance level.

Table 2 Summarising the analysis of 1000 simulated data sets: urban background pollution concentrations with additive error
Table 3 Summarising the analysis of 1000 simulated data sets: rural pollution concentrations with additive error
Table 4 Summarising the analysis of 1000 simulated data sets: nitrogen dioxide concentrations with proportional error

Finally using established theory (See Additional file 2) we obtained predictions of the attenuation in β that we might expect from using CTM data or data from a single monitor per region. These predictions were then compared to the corresponding results obtained from our simulations.

Error decomposition

In order to aid interpretation of our simulation results for the CTM data, we decomposed the grid-specific error variance var Z i X i * into two components, a classical-like component (CC), and a Berkson-like component (BC) as follows:

var Z i X i * = cov Z i X i * , Z i + cov Z i X i * , X i * = CC + BC


CC = cov Z i X i * , Z i = var Z i cov Z i , X i *


BC = cov Z i X i * , X i * = var X i * cov Z i , X i *

Estimates of CC and BC were then obtained using the observed data (See Additional file 3 for further details and calculations).


Comparing “true” values of the regression coefficient, β, (e.g.  β × 10 = 0.00399 for urban background ozone) with those based on simulated data, β ^ , Tables 2 and 3 suggest that the use of regional average monitor data as a surrogate for grid-specific “true” ambient concentrations has limited impact on health effect estimates unless the number of monitors per 25 km × 25 km grid-square falls below 3 (or possibly 5 in the case of rural loge(NO2)). The monitoring scenario which produced the largest bias in the health effect for all four pollutants was that of a single monitor per 25 km × 25 km grid-square. The regression coefficient was attenuated by an estimated 6% for urban ozone, 13% for rural ozone, 29% for urban loge(NO2) and 38% for rural loge(NO2). By contrast when we used grid-specific model data, the regression coefficient was attenuated by an estimated 19% for urban ozone, 22% for rural ozone, 54% for urban loge(NO2) and 44% for rural loge(NO2). Thus, although for rural loge(NO2) results were similar to those of the 1 monitor per region scenario, for urban and rural ozone, urban loge(NO2) and for less sparse monitoring networks the use of model rather than monitor data appeared to produce a more marked level of bias in the health effect estimate. Comparison of the “true” values of the regression coefficient with those based on simulated “true” data (Tables 2 and 3) suggests that our findings are not simply due to an inadequate number of simulations.

Of particular note are the small coverage probabilities for loge(NO2), especially when using the grid-specific model data, but also evident when using measured rural data from a single monitor within each 25 km × 25 km grid. These suggest that not only is there marked attenuation in the health effect estimate but that bias extends to the standard errors, such that few simulations produced a 95% confidence interval containing the “true” value of β (only 15% for urban background modelled loge(NO2) and 11% for rural modelled loge(NO2) (Tables 2 and 3). As expected statistical power for loge(NO2) is consistently higher than for ozone as the magnitude of the “true” effect to be detected is larger (i.e. a 0.4% increase in mortality per 10% increase in NO2 versus a 0.4% increase in mortality per 10 μg/m3 in ozone). Nevertheless, the use of grid-specific model data for urban and rural ozone and the use of either model or 1 monitor per region data for urban loge(NO2) appears to have a slightly adverse effect on power.

Table 4 presents results for NO2 assuming proportional measurement error (i.e. additive on a log scale) but where the relationship of interest is with the untransformed variable. Overall, compared to loge(NO2), power-loss due to measurement error was similar but coverage probabilities, particularly for model data, improved. Model data and the single monitor scenario registered the largest attenuation in the regression coefficient, but there was noticeable attenuation even when using regional averages based on 5 monitors per 25 km × 25 km region.

Predictions from theory

For model data and for the 1 monitor scenario, established theory (see Additional file 2) allows us to predict the effects of additive measurement error on the health effect estimate. Table 5 illustrates that estimates of attenuation in β obtained by simulation are not that dissimilar from those obtained using standard theory in this simple case.

Table 5 Estimated attenuation in the health effect estimate: comparing simulation and theory


In the context of a time-series analysis of the association between daily concentration of air pollution and mortality, our study used simulation as a technique to contrast the effects on the estimation of that association of using grid-specific pollution data derived from a 3-D chemistry-transport model as opposed to regional average air pollution concentrations derived from monitors. Pollution concentrations were simulated both with (i.e. monitor data), and without (i.e. “true” data) classical “instrument and monitor-location” error. The “true” data were then used in the statistical simulation of model data with the inclusion of both classical and Berkson-like error. The parameter estimates driving our simulations were based on both monitor and CTM daily maximum 8-hour mean ozone data for 35 urban background and 21 rural monitoring sites across the UK and on both monitor and CTM loge(daily maximum 1-hour NO2) data for 43 urban background and 14 rural monitoring sites across the UK. Within-grid correlations between observed monitor and CTM data were relatively strong with average correlation coefficients of 0.73 for rural ozone, 0.76 for urban ozone, 0.67 for rural loge(NO2) and 0.61 for urban loge(NO2). The lower correlations for loge(NO2) were likely a consequence of the shorter averaging time of the NO2 metric (i.e. 1-hour rather than 8-hour for ozone).

For both pollutants (i.e. ozone and loge(NO2)), the use of a single monitor to provide estimated pollution concentrations for every 5 km × 5 km grid within a 25 km × 25 km region produced attenuated health effect estimates. This attenuation was less marked for the more spatially homogeneous long-lived pollutant ozone, for which the short distance correlations in Figure 1 were strong, than for the short-lived pollutant loge(NO2) for which the short distance correlations were considerably weaker. However for other scenarios, particularly those based on 5 or 10 monitors, the use of regional averages with additive rather than proportional error had little effect on health effect estimates. This concurs with the simulation findings of Sheppard et al. [12] who reported a “small but noticeable” attenuation in the heath effect estimate when ambient area exposure to PM2.5 was based on a single pollution monitor, but little if any attenuation when area exposure was based on the average across 3 or 10 monitors.

Goldman et al. [16] recognized that a large proportion of the measurement error introduced by the use of average monitor concentrations is due to spatial variation and suggests that such error is predominantly Berkson, which, while reducing statistical power, will not on its own lead to bias in health effect estimates. However as classical error is introduced, occurring as we introduce instrument error and monitor-site location error into our simulations and reduce the number of monitors on which averages are based, attenuation in the health effect estimate is observed. This is more pronounced for loge(NO2), particularly rural loge(NO2) than for ozone. This suggests, in line with the findings of others, that attenuation of the relative risk depends not only on instrument error but on the number and placement of monitors [6, 16, 23] and on the level of spatial variation [6, 23, 24]. As suggested by Goldman et al. [16], it may be the combination of these sources which determine the ultimate effect on relative risk estimates.

The combined effects of different error sources may also help to explain why contrary to expectation we found no evidence in Tables 2 and 3 (i.e. additive measurement error) of any reduction in statistical power from the use of regional average monitor data based on 2, 3, 5 or 10 monitors per region, with any loss of power most noticeable for the 1 monitor scenario in particular in relation to urban loge(NO2).

The use of simulated model data produced attenuation in the health effect estimate, which for rural loge(NO2) was similar to that associated with the scenario of a single regional monitor. However for urban and rural ozone and particularly urban loge(NO2) regression coefficients were more biased towards the null than for the single monitor case. According to Sheppard et al. [25] classical error can result not only in an attenuated health effect estimate but also lead to a downward bias in the estimation of standard errors and thus to inaccuracy in the coverage of 95% confidence intervals. The appreciable bias in health effect estimates and coverage intervals based on simulated model data for loge(NO2) therefore implies the presence of predominantly classical rather than Berkson-like error in EMEP-WRF CTM estimates of this pollution metric. In order to investigate this further we attempted using our comparison dataset to decompose random measurement error into its classical-like and Berkson-like components (Additional file 3). Our results suggested that indeed classical error predominates overwhelmingly in the loge(NO2) CTM data.

The use of NO2 rather than loge(NO2) (i.e. proportional rather than additive measurement error) appeared to lead to a marked improvement in the previously poor coverage probabilities of the model data but further attenuation in health effect estimates based on regional averages. However these regional averages still tended to outperform model data with the possible exception of the 1 monitor per 25 km × 25 km grid square scenario for rural NO2 where monitor and model findings were comparable. Unlike additive measurement error whose biasing effect on grid means is effectively adjusted for by including grid as a fixed effect in our time-series analyses, this is not the case when measurement error is proportional. For model data with proportional error therefore it is important to note that our findings may depend to some extent on grid-specific mean pollution levels and the validity of the assumptions we make in simulating them (see Equation 1.2a).

One of the strengths of our simulation approach is that it allows the correlation between time-series in different grids to vary according to the distance between those grids. However, in so doing we make the assumption that spatial dependence is characterised by a single linear function. In our regression analysis of the association between correlation and distance (Figure 1) the addition of a quadratic term was statistically significant for urban and rural ozone and for urban loge(NO2), although for all three pollutants the incorporation of this non-linearity had a relatively small impact on the percentage of variance explained (explaining an additional 0.2, 1.6 and 1.6 percentage points respectively). We also assume that spatial dependence is independent of direction (i.e. isotropic) and geography (other than a distinction between urban and rural) and does not vary over time. This may not be the case if the study area contains point sources, the outflow from which may vary in direction, with direction varying itself over time due to changing weather conditions. Nevertheless this is an assumption employed by other authors [5, 23] in this field, possibly due to the fact that data sufficient to incorporate such features into simulation studies is not readily available or generalizable.

Our simulations allow mean pollution concentrations to vary between grids although we assume that they vary at random and do not take account of the fact that mean pollution concentrations in nearest neighbour grids may be more similar than those at a distance. This could have implications for our results involving proportional measurement error. However, when for each pair of monitoring stations in our observed monitor data set we plotted the absolute difference in site means against distance there was no evidence of a linear relationship whether for loge(NO2) or ozone, urban or rural. Though in some ways reassuring, these findings may nevertheless be insensitive to differences in grid-mean pollution concentrations within urban areas, where for example background levels of NO2 tend to increase as one approaches the urban core [26], whilst background levels of O3 tend to decrease.

A further limitation is that we use the same variance to generate each within-grid time-series and that time-series, both modelled and monitored, are simulated without seasonal pattern or trend. Hence we do not consider the influence of time-dependent confounding variables nor other confounders or pollutants. However the effects of measurement error in multi-pollutant models [4, 27] and in the presence of confounders have been considered by others [25, 28].

Although quantitatively the simulation parameters we used (and hence our results) only apply to the EMEP-WRF model v3.7 for the British Isles, the simulation approach is generalizable and may be used in the evaluation of other chemistry-transport models in other areas.

Eulerian CTMs similar to the EMEP model discretize the real world using a fixed horizontal and vertical grid with no explicit information of within-grid variability of emissions. Linear emissions such as roads and/or point sources are averaged to the CTM horizontal resolution. This approximation may limit the model ability to resolve the near sources chemistry and transport which is likely to occur near urban monitor sites. Moreover, the EMEP model was not designed to replicate the complex urban environment. Local dispersion models which can represent the fine-scale complexity of an urban environment are currently available (ADMS, ERG models), however they are very computationally expensive and are limited to specific areas and rely on CTMs for boundary condition in order to capture the regional import/export of pollutants.

The benefit of full temporal and UK coverage and the self-consistency of predicted chemicals parameters should not be underestimated, and perhaps this benefit overcomes the shortage of properly representing the surface urban chemistry.

Our present findings suggest that there may be an appreciable penalty of using CTM data in spatially-resolved epidemiological time-series studies, which for some pollutants in part weighs against the substantial benefits of such modelled data. These advantages include the opportunity to investigate pollutants (e.g. different particle measures) with sparse or zero monitor coverage, or pollutants from specific sources with direct relevance to policy formulation and evaluation, or the potential consequences from alternative future scenarios. For the simulations incorporating additive measurement error (Tables 2 and 3) and the input data used in this work, we found that monitor data out-performed model data in urban areas and in areas with at least 2 monitors per 25 km × 25 km grid-square but that the performance of monitor and model data for loge(NO2), at least in terms of power and attenuation in the regression coefficient, was similar in rural areas with only 1 monitor per 25 km × 25 km grid-square. However, it is important to be clear that the impact of ‘measurement’ error as assessed in this paper is only one aspect of data performance relevant to the use of modelled versus monitored data in epidemiological studies, and that monitored data themselves, typically characterised by sparse data from preferential (similar type) locations with measurement errors and often missing values, also have their limitations which are often ignored. High resolution CTMs are continually being developed and our study suggests that further assessment of model error impact - which includes statistical simulation – as well as improved understanding of the performance of monitored data, would be useful.


Even if correlations between model and monitor data appear reasonably strong, additive classical measurement error in model data may lead to appreciable bias in health effect estimates. As process-based air pollution models become more widely used in epidemiological time-series analysis because of their advantages in terms of geographical coverage and their potential to provide complete time-series for all pollutant species of interest, assessments of error impact which include statistical simulation may be useful.



Chemistry-transport model.


  1. Liu K, Stamler J, Dyer A, McKeever J, McKeever P: Statistical methods to assess and minimise the role of intra-individual variability in obscuring the relationship between dietary lipids and serum cholesterol. J Chron Dis. 1978, 31: 399-418. 10.1016/0021-9681(78)90004-8.

    Article  CAS  PubMed  Google Scholar 

  2. Armstrong B: Effect of measurement error on epidemiological studies of environmental and occupational exposures. Occup Environ Med. 1998, 55: 651-656. 10.1136/oem.55.10.651.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Reeves GK, Cox DR, Darby SC, Whitley E: Some aspects of measurement error in explanatory variables for continuous and binary regression models. Statist Med. 1998, 17: 2157-2177. 10.1002/(SICI)1097-0258(19981015)17:19<2157::AID-SIM916>3.0.CO;2-F.

    Article  CAS  Google Scholar 

  4. Zeger SL, Thomas D, Dominici F, Samet JM, Schwartz J, Dockery D, Cohen A: Exposure measurement error in time-series studies of air pollution: concepts and consequences. Environ Health Perspect. 2000, 108: 419-426. 10.1289/ehp.00108419.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Goldman GT, Mulholland JA, Russell AG, Strickland MJ, Klein M, Waller LA, Tolbert PE: Impact of exposure measurement error in air pollution epidemiology: effect of error type in time-series studies. Environ Health. 2011, 10: 61-71. 10.1186/1476-069X-10-61.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Lee D, Shaddick G: Spatial modelling of air pollution in studies of its short-term health effects. Biometrics. 2010, 66: 1238-1246. 10.1111/j.1541-0420.2009.01376.x.

    Article  CAS  PubMed  Google Scholar 

  7. Steenland K, Deddens JA, Zhao S: Biases in estimating the effect of cumulative exposure in log-linear models when estimated exposure levels are assigned. Scand J Work Environ Health. 2000, 26: 37-43. 10.5271/sjweh.508.

    Article  CAS  PubMed  Google Scholar 

  8. Fung KY, Krewski D: On measurement error adjustment methods in Poisson regression. Environmetrics. 1999, 10: 213-224. 10.1002/(SICI)1099-095X(199903/04)10:2<213::AID-ENV349>3.0.CO;2-B.

    Article  Google Scholar 

  9. Anderson HR, Atkinson RW, Bremner SA, Carrington J, Peacock J: Quantitative systematic review of short term associations between ambient air pollution (particulate matter, ozone, nitrogen dioxide, sulphur dioxide and carbon monoxide), and mortality and morbidity. 2007, Report to Department of Health revised following first review,,

    Google Scholar 

  10. World Health Organisation: Air quality guidelines: global update 2005. Particulate matter, ozone, nitrogen dioxide and sulphur dioxide. 2006, Copenhagen: WHO Regional Office for Europe,,-ozone,-nitrogen-dioxide-and-sulfur-dioxide,

    Google Scholar 

  11. Dominici F, Zeger SL, Samet JM: A measurement error model for time-series studies of air pollution and mortality. Biostatistics. 2000, 1: 157-175. 10.1093/biostatistics/1.2.157.

    Article  PubMed  Google Scholar 

  12. Sheppard L, Slaughter JC, Schildcrout J, Liu L-JS, Lumley T: Exposure and measurement contributions to estimates of acute air pollution effects. J Expos Anal Environ Epidemiol. 2005, 15: 366-376. 10.1038/sj.jea.7500413.

    Article  CAS  Google Scholar 

  13. Szpiro AA, Paciorek CJ, Sheppard L: Does more accurate exposure prediction necessarily improve health effect estimates?. Epidemiology. 2011, 22: 680-685. 10.1097/EDE.0b013e3182254cc6.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Szpiro AA, Sheppard L, Lumley T: Efficient measurement error correction with spatially misaligned data. Biostatistics. 2011, 12: 610-623. 10.1093/biostatistics/kxq083.

    Article  PubMed  PubMed Central  Google Scholar 

  15. StataCorp: Stata Statistical Software: Release 10. 2007, College Station, TX: StataCorp LP

    Google Scholar 

  16. Goldman GT, Mulholland JA, Russell AG, Gass K, Strickland MJ, Tolbert PE: Characterisation of ambient air pollution measurement error in a time-series health study using a geostatistical simulation approach. Atmos Environ. 2012, 57: 101-108.

    Article  CAS  Google Scholar 

  17. Cox DR, Hinkley DV: Appendix 3 Second-order regression for arbitrary random variables. Theoretical Statistics. 1974, London: Chapman and Hall, 475-477.

    Chapter  Google Scholar 

  18. Automatic Urban and Rural Monitoring Network (AURN) Data Archive: Automatic Urban and Rural Monitoring Network (AURN) Data Archive.,

  19. Simpson D, Benedictow A, Berge H, Bergström R, Emberson LD, Fagerli H, Flechard CR, Hayman GD, Gauss M, Jonson JE, Jenkin ME, Nyíri A, Richter C, Semeena VS, Tsyro S, Tuovinen J-P, Valdebenito Á, Wind P: The EMEP MSC-W chemical transport model - technical description. Atmos Chem Phys. 2012, 12: 7825-7865. 10.5194/acp-12-7825-2012.

    Article  CAS  Google Scholar 

  20. Vieno M, Dore AJ, Stevenson DS, Doherty R, Heal MR, Reis S, Hallsworth S, Tarrason L, Wind P, Fowler D, Simpson D, Sutton MA: Modelling surface ozone during the 2003 heat-wave in the UK. Atmos Chem Phys. 2010, 10: 7963-7978. 10.5194/acp-10-7963-2010.

    Article  CAS  Google Scholar 

  21. Carslaw D: Defra regional and transboundary model evaluation analysis - phase 1, a report for Defra and the Devolved Administrations. 2011,,

    Google Scholar 

  22. Fagerli H, Gauss M, Benedictow A, Griesfeller J, Jonson JE, Nyíri Á, Schulz M, Simpson D, Steensen BM, Tsyro S, Valdebenito Á, Wind P, Aas W, Hjellbrekke A-G, Mareckova K, Wankmüller R, Iversen T, Kirkevåg A, Seland Ø, Vieno M: Transboundary acidification, eutrophication and ground level ozone in Europe in 2009. EMEP Status Report 1/2011. 2011, Oslo: Norwegian Meteorological Institute

    Google Scholar 

  23. Peng RD, Bell ML: Spatial misalignment in time series studies of air pollution and health data. Biostatistics. 2010, 11: 720-740. 10.1093/biostatistics/kxq017.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Kim S-Y, Sheppard L, Kim H: Health effects of long-term air pollution: influence of exposure prediction methods. Epidemiology. 2009, 20: 442-450. 10.1097/EDE.0b013e31819e4331.

    Article  PubMed  Google Scholar 

  25. Sheppard L, Burnett RT, Szpiro AA, Kim S-Y, Jerrett M, Pope CA, Brunekreef B: Confounding and exposure measurement error in air pollution epidemiology. Air Qual Atmos Health. 2012, 5: 203-216. 10.1007/s11869-011-0140-9.

    Article  PubMed  Google Scholar 

  26. Strickland MJ, Darrow LA, Mulholland JA, Klein M, Flanders WD, Winquist A, Tolbert PE: Implications of different approaches for characterizing ambient air pollutant concentrations within the urban airshed for time-series studies and health benefit analyses. Environ Health. 2011, 10: 36-44. 10.1186/1476-069X-10-36.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Carrothers TJ, Evans JS: Assessing the impact of differential measurement error on estimates of fine particle mortality. J Air Waste Manage Assoc. 2000, 50: 65-74. 10.1080/10473289.2000.10463988.

    Article  CAS  Google Scholar 

  28. Carroll RJ, Gallo PP, Glesser LJ: Comparison of least squares and errors-in-variables regression, with special reference to randomised analysis of covariance. J Am Stat Assoc. 1985, 80: 929-932. 10.1080/01621459.1985.10478206.

    Article  Google Scholar 

Pre-publication history

Download references


The authors would like to thank Zaid Chalabi (London School of Hygiene and Tropical Medicine) for reading the article in draft, picking up errors and making helpful suggestions which contributed to the intellectual content. The article was produced as part of the AWESOME project which is funded by a grant from the Natural Environment Research Council (NERC, NE/I007938/1). The NERC grant includes full funding for BKBs current post at St George’s, University of London. We would also like to acknowledge use of monitor data from the UK Department for Environment, Food and Rural Affairs Automatic Urban and Rural Network (AURN) which is public sector information licenced under the Open Government Licence v1.0 [].

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ben Armstrong.

Additional information

Competing interests

MRH, RMD and MV have an academic interest in the EMEP-WRF CTM and its development. There are no other conflicts of interest.

Authors’ contributions

BKB contributed to the design of the study, analysed the data, carried out the simulations and took the lead in drafting the paper. BA provided theoretical statistical expertise and contributed to the design and concept of the study. RWA and PW contributed to the design and concept of the study. MRH and RMD assembled the model data and the model-monitor comparison data sets. MV is the main developer of the EMEP-WRF regional chemistry-transport model and produced the model output. All authors contributed to the drafting of the paper, the interpretation of results and read and approved the final manuscript.

Electronic supplementary material

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Butland, B.K., Armstrong, B., Atkinson, R.W. et al. Measurement error in time-series analysis: a simulation study comparing modelled and monitored data. BMC Med Res Methodol 13, 136 (2013).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: