Relative performance of different exposure modeling approaches for sulfur dioxide concentrations in the air in rural western Canada

Background The main objective of this paper is to compare different methods for predicting the levels of SO2 air pollution in oil and gas producing area of rural western Canada. Month-long average air quality measurements were collected over a two-year period (2001–2002) at multiple locations, with some side-by-side measurements, and repeated time-series at selected locations. Methods We explored how accurately location-specific mean concentrations of SO2 can be predicted for 2002 at 666 locations with multiple measurements. Means of repeated measurements on the 666 locations in 2002 were used as the alloyed gold standard (AGS). First, we considered two approaches: one that uses one measurement from each location of interest; and the other that uses context data on proximity of monitoring sites to putative sources of emission in 2002. Second, we imagined that all of the previous year's (2001's) data were also available to exposure assessors: 9,464 measurements and their context (month, proximity to sources). Exposure prediction approaches we explored with the 2001 data included regression modeling using either mixed or fixed effects models. Third, we used Bayesian methods to combine single measurements from locations in 2002 (not used to calculate AGS) with different priors. Results The regression method that included both fixed and random effects for prediction (Best Linear Unbiased Predictor) had the best agreement with the AGS (Pearson correlation 0.77) and the smallest mean squared error (MSE: 0.03). The second best method in terms of correlation with AGS (0.74) and MSE (0.09) was the Bayesian method that uses normal mixture prior derived from predictions of the 2001 mixed effects applied in the 2002 context. Conclusion It is likely that either collecting some measurements from the desired locations and time periods or predictions of a reasonable empirical mixed effects model perhaps is sufficient in most epidemiological applications. The method to be used in any specific investigation will depend on how much uncertainty can be tolerated in exposure assessment and how closely available data matches circumstances for which estimates/predictions are required.


Background
It is well established that errors in exposure estimation can bias the results of epidemiological investigations. This takes most commonly the form of attenuation of the exposure-response association such that there is a danger of a false negative conclusion [1,2]. In addition, non-differential exposure misclassification can lead to reduced widths of confidence intervals of risk estimates, potentially leading to false positive results [1]. In some circumstances, differential misclassification of exposure can also produce positive bias in exposure-response relations, leading to false positive findings [3]. The implications of both false negative and false positive results of epidemiological studies can be profound. Specifically, in the first case, important causes of disease could be missed and, as a consequence, preventable disease may remain unchecked. In the second case, harm could be caused by implementation of inappropriate prevention measures and policies, and by creating unnecessary anxiety in the community.
In statistical literature, exposure misclassification and miss-measurement are known as a measurement error problem and a plethora of approaches exist to correct for biases that arise from it under certain assumptions [4,5]. One obvious approach to the problem is to obtain more precise exposure estimates instead of correcting for a known or suspected extent of exposure miss-measurement. In this regard, advances in monitoring technology have been helpful, such as passive monitoring that reduces the cost of measuring exposures, thereby obtaining larger volumes of relevant data that yield more accurate exposure estimates [6][7][8][9]. In the current project, passive monitoring technology was used to collect large quantities of air quality measurements over a vast geographical area.
In parallel, developments in exposure modeling/prediction methodologies are also valuable, such as groupbased [10,11] and (statistical) model-based based exposure assessment [12], even though they are only recently starting to 'connect' with the mainstream literature on measurement error. Although the ecological fallacy may arise in epidemiological studies that utilize this approach, this does not diminish the utility of group-based exposure assessment in which all members of a group are assigned the same exposure status that reflects average exposure in the area/group. The ecological fallacy can be avoided by collecting information on confounders at the individual level. This approach to exposure misclassification is still under active development and there are ongoing arguments as to whether it is possible to infer individual exposure from either micro-(e.g. in persons' living room) or macro-environment (e.g. central air monitoring station for a town) measurements [13].
One of the exposure modeling approaches that, at least conceptually, holds great promise incorporates knowledge from empirical (statistical) and theoretical (physical) exposure assessment approaches in the Bayesian framework [14]. It has been suggested that, in occupational exposure assessment, a more accurate estimate of exposure can be obtained by combining pre-existing information about exposure status (e.g. schematics of workplaces, knowledge of chemicals used and transformed in a workplace, historical measurements from related operations, opinions of occupational hygienists) with exposure measurements [14]. This idea was critiqued [15] emphasizing that informative priors cannot be obtained in most occupational studies due to the lack of validated physical exposure models. However, the suggested approach may hold more promise in applications where informative priors can be obtained, as in modeling of air quality in relation to industrial emissions into the general environment or from routinely collected data on air quality, to provide some notion of the shapes of probability distributions of exposure in a given location.
Area measurement of air pollutants is often used as a proxy of exposure in epidemiological studies and for the purpose of this paper the two terms will be used interchangeably. The main objective of this work was to determine how we can best use currently available information on air concentrations of SO 2 in rural western Canada to predict location-specific average exposure in a manner that is both cost-effective and accurate. We explore a prediction problem in a different time period at the fixed monitoring sites where some relevant data on sources and past air quality data may be available.

Data
Air monitoring data were collected for the study of health of cattle, as indicators of possible health effects on humans [16]. Month-long average air samples of SO 2 (i.e. measurements integrating concentrations over a calendar month) were collected over a two-year period (April 2001 to December 2002) at various locations ( Figure 1) across rural areas of western Canada that are associated with both cattle ranching and primary oil/gas exploration. In any given month, there were between 115 and 928 SO 2 monitoring sites: the numbers of monitoring sites peaked in summer and declined in winter, primarily because monitoring sites tracked the movement of cattle herds, which were dispersed in summer and concentrated in winter. The proportion of sites with repeated measurements within a month (side-by-side measurements) was 90% till August 2001, but then declined to ~10%. Air quality (SO 2 ) measurements were described reasonably well by lognormal distributions. Air concentrations of For the purpose of this methodological investigation, we imagine that measurements were available only from 2001 and that our objective was to predict location-specific average exposures in 2002 (as was indeed the goal of the animal health study from which the data arose). Furthermore, we assume that for 2002 we had an option (though not necessarily exercised, depending in the hypothetical scenario outlined below) to collect one measurement from a randomly selected relevant (i.e. when cattle was housed at the site) month at each location. We will use nomenclature described in Figure 2 to refer to differ-Maps of spatial distribution of SO 2 air quality monitors Figure 1 Maps of spatial distribution of SO 2 air quality monitors.

Alloyed gold standard (AGS)
In order to evaluate the performance of different exposure modeling approaches, we need to know the true value of the location-specific mean exposure at each location in 2002. However, we only have observed time-series with repeated observations at each location and therefore can only estimate these values. Consequently, we were only able to assess the performance of different exposure modeling approaches in relation to our best estimate of the true value. This approach that does not adjust for measurement error and yet is free from any model assumptions is a location-specific arithmetic average, a direct measure of latent quantity of interest. We computed this at locations where there were repeated measurements in 2002 and designated it as M0*. Measurements that were imagined to have been collected in 2002 (d2) were the locationstratified random subset of all 2002 measurements; they were not used in calculation of the alloyed gold standard.

Overview of prediction methods considered
One measurement from each location that was not used to calculate AGS was assumed to have been observed in 2002. We considered approaches that uses one monthlong average measurement from each location of interest in 2002 (M1); and the other -context data on proximity of monitoring sites to putative sources of emission in 2002 (M2). In addition, we imagined that all of the previous year's (2001's) data were also available to exposure assessors. Exposure prediction approaches we explored with the 2001 measurement data included regression modeling using either mixed (M3) or fixed effects models (M3f). Lastly, we used Bayesian methods to combine single measurements from locations in 2002 (M1) with different priors (M4-M6). These approaches described within two separate scenarios below: without any measurements from 2002 (M2, M3, M3f) and with one measurement per location of interest in 2002 (M1, M4-M6).

The first scenario: no measurements in 2002
If we choose not to collect any measurements in 2002 and rely on the 2001 data to make 2002 exposure predictions, we may consider two options. First, we could construct a model of the determinants of exposure using only 2001 data (d1 and c1). We will assume that it will have the same functional form as a model built previously [16].
where Δ 2 = distance in km from the monitoring location to a specified oil and gas infrastructure (oil wells or gas plants in this case) within the 2 km radius of the monitoring station (industrial infrastructure outside of this radius was ignored in the calculation of Δ 2 ); Δ 2-50 = distance in km from the monitoring location to a specified oil and gas infrastructure within the 2-50 km torus; and random effects with the estimate of between-location variance (s 2 L1 ) 0.23, the estimate of month-to-month variance (s 2 T1 ) 0.09, and the estimate of between-repeat (within month and location) variance (s 2 R1 ) 0.21. This model is very similar in terms of the magnitude of fixed and random effects to the model that was previously derived in the basis of the entire data available to us [16]. The rationale for formulating distance to sources as in equation (1) is described in greater detail below.
Alternatively, we could be skeptical about the value of 2001 data and models that they yield, and rely exclusively on the description of measurement sites in 2002 in terms of their proximity to oil and gas infrastructure (i.e. c2) to rank locations in terms of expected SO 2 concentrations (M2). Several such rankings are possible, because we do not know a priori which context (i.e. proximity to what type(s) of facilities) is best to use. Concentrations near point sources of emission in flat terrain without strong prevailing winds can be described as being directly proportional to the emission rate and inversely proportional to the separation distance taken to the power of 2/3, a distance decay model [17]. This informed the parameterization of predictive models we developed in M3, and appears to be a reasonable starting point for ranking different monitoring sites with respect to anticipated air quality. However, there is uncertainty about which distance to which oil and gas facilities is the most sensible to use in predicting SO 2 concentrations. On one hand, strong sources of SO 2 emissions, such as gas plants, seem obvious candidates, but they are less numerous and farther away from monitoring locations than wells and batteries. Thus, all these facilities can potentially impact SO 2 concentration and the context of 2002 measurement sites (c2) was described in terms of proximity to all wells, all batteries and all gas plants. The proximity measure was described in detail previously [16]: it is a sum of (distance in km) -2/3 for each facility type within 2 km or 50 km radius around each monitoring site. The coordinates of different active oil and gas facilities in 2002 were supplied to us by the regulatory agencies from the Canadian provinces of Alberta, Saskatchewan and British Columbia, enabling us to estimate the distances. Proximity to the following facilities was estimated: wells with 2 km, wells within 50 km, batteries within 2 km, batteries within 50 km, gas plants within 2 km, and gas plants within 50 km.
All regression models and their predictions in the manuscript were made in SAS (version 9.1, SAS Institute, Cary, NC) PROC MIXED using the REML algorithm.

The second scenario: some measurements in 2002
If one measurement was collected in 2002 from each location of interested on a randomly chosen month (d2), we can consider the following exposure estimation options. A simple approach is to use a single measurement from each location in 2002 to estimate mean location-specific exposures in 2002 (M1).
We could also dismiss the 2001 data except for estimating measurement error variance using repeated measurements and then 'correct' 2002 measurements for this measurement error under the assumption of log-normal distribution of true exposure levels (M4). We can also use estimates from M3 as a basis for an empirical normal mixture prior with an unknown number of components for observed data d 2 to obtain method M5. Alternatively, we could mistrust 2001 measurements and rely only on the context of 2002 measurements (c2) for the prior information, leading us to method M6, which also utilizes normal mixture prior with an unknown number of components.
Bayesian approaches have been adopted for adjusting bias arising from measurement error [5]. Parameters of a Bayesian model are not assumed to be fixed, but vary at random in accordance with some probability distributions. For each parameter (or a set of parameters), a probability distribution that reflects its prior knowledge/belief is specified and combined with the likelihood function of the data to obtain a posterior distribution of the parameter(s) (e.g., location-specific means of SO 2 concentration in our case). This posterior distribution includes all knowledge/ belief related to the parameters from the prior and the likelihood involving covariates (i.e. data and assumed models). It is usually obtained by means of the Monte Carlo integration using Markov Chain (MCMC) unless it is analytically tractable. The variables observed with error are also considered to be random, so that they are incorporated into the process of sampling from the posterior.
Bayesian analysis has been developed to adjust for measurement error by specifying two sub-models: i) a measurement error model relating the observed exposure with error and the true exposure; and ii) the prior distribution of the true exposure. The true exposure is assumed to have either a lognormal distribution for a specific known prior (M4), or a mixture of normal distributions with unknown number of components, a flexible approach aimed to overcome potential misspecification of the prior distribution (M5 and M6). The reversible jump algorithm [18] is used for the normal mixture prior with unknown number of components, together with the standard Gibbs or Metropolis algorithm. The details of the Bayesian models and their implementation are given in the Appendix.

Measures of relative performance
Comparing estimated exposures to M0* (the arithmetic mean used as the AGS) will enable us to evaluate relative performance of different exposure assessment methods. In environmental epidemiology, the association of interest may be that between the concentrations of a contaminant (ppb SO 2 in our case) and risk of a disease. The most commonly-used exposure-disease model is the logistic regression model. Because the relationship between true (ϕ T ) and observed (ϕ O ) risk gradients in logistic regression is determined by Pearson correlation between true and observed exposure (ρ TO ) as in ϕ T = ϕ O /ρ 2 TO [1], and a correlation between two random variables can be estimated without fully specifying their distributions, we use the Pearson correlation between the SO 2 levels predicted by the different exposure estimation procedures and the alloyed gold standard (M0*) as a measure of relative performance of the different procedures. We also computed mean squared error (MSE): mean of (estimate -AGS) 2 .

Results
The alloyed gold standard could only be calculated for the 666 sites that had repeated air quality measurements (out of total of 903 sites) in 2002. The average number of repeated measurements per location was six, ranging from two to 24.
A summary of the relative performance of the different exposure estimation methods is presented in Table 1.
Overall, M3 appears to be superior in terms of the strongest correlation with the alloyed gold standards and the smallest MSE (Figure 3). Recall that in M3, we used both fixed and random terms of the model based on 2001 data to predict 2002 measurements (by plugging-in functions of distance to sources, c2, into equation (1) and computing mean SO 2 concentration (ppb) for each location). If only the fixed effects from equation (1) were used (as is the case if a fixed-effects ordinary least square model was used to identify determinants of exposure), a poor agreement between predicted (M3f) and the alloyed gold standard was observed (r = 0.33). The application of a distance-decay model [17] in the exposure estimation method M2 produced the worst predictions: only the correlation with proximity to all batteries within 50 km was positive and statistically different from zero: r = 0.21 (MSE = 0.28). Proximity to batteries was selected as a prior for M6, because its correlation is the only consistent positive predictor of measured SO 2 levels (M2), and because earlier work relied on the assumption that batteries are a good proxy of exposure to SO 2 [19], making it a natural choice for the Bayesian prior derived from c2. The distribution of values used as a prior based on proximity to batteries (also equivalent to M2) is shown in Figure 4; it implies a distribution that does not easily fit any common parametric form. The use of this prior with measurements collected in 2002 in the Bayesian normal mixture method (M6) produced estimated SO 2 concentrations that did not agree very well with the alloyed gold standard: r = 0.28 (MSE = 0.30). The normal mixture approach with a prior derived from air quality predictions obtained in M3 (M5) yielded the second best predictions.

Discussion
Strictly speaking, our observations only apply to the particular data set from which they were derived and a specific sample of 2002 observations. However, the results suggest some general conclusions about the estimation of environmental concentrations of pollutants derived from industrial sources. When no measurements of air quality are available, we can expect predictions by a simple distance-decay model to have poor agreement with true air  Histogram of measures of proximity to all (oil and gas) batteries within 50 km radii (bat50): prior for method 6 (N = 666); xaxis: proximity to batteries with 50 km of the monitor (km 2/3 ).
true exposure being predicted, the Bayesian normal mixture method is expected to falter relative to the simple collection of relevant data. This echoes a previous suggestion that, in many situations, the effort involved in modeling exposures may exceed that required to collect measurements [20]. The main limitation of our study is the lack of a gold standard to evaluate the performance of different exposure assessment procedures. We are inclined to believe that our choice of gold standard that is free from model assumptions is indicative of true performance of the compared methodologies. In this way, comparison is not biased in favor of a method that may be employed to produce an alloyed gold standard adjusted for measurement error. Thus, although our chosen alloyed gold standard is contaminated by measurement error, it was obtained without resorting to the assumptions that are used in the competing exposure assessment methods.
We had the luxury of a large 2001 dataset that enabled us to create an empirical prior that probably closely reflects the distribution of true values and the extent of measurement error. It may not be possible to rely on such preexisting data in many studies. Given the sensitivity of the Bayesian methods to 'quality' of the prior, careful judgment is required in deciding whether it is better to invest resources into extensive data collection or complex modeling. It must be noted that our 2001 data did not cover every month (data collection began in April) whereas 2002 measurements were spread across all months in 2002. This presented a realistic challenge to our exposure assessment models of estimating exposures for temporally misaligned data in presence of temporal trends in exposures within a year (see Figure 4 in [16]).
Our data was not very variable and contained only a modest measurement error. Thus, our conclusions may not hold for more variable and more error-prone situations that may arise in environmental exposure assessment, as reported for volatile organic compounds [21,22].
Another limitation of our work presented here is that we were not able to explore all possible modeling techniques that may be potentially available for predicting air pollution levels. It is for this reason that we focused on methods that appear to be sensible "first choices" in the given setting plus some more exotic Bayesian model that we wished to evaluate. Specifically, an autoregressive integrated moving average (ARIMA) approach may be suitable for part of our data where spatially aligned time-series can be identified as may be a more flexible methodology of Calder et al [23,24]. In addition, it may be possible to obtain better predictions through the empirical regression models by relaxing assumptions based on the model of Strosher [17], by either modeling the power transformation, employing generalized additive models, or using neural networks that relax parametric assumptions about the shape of distance-concentration association (see the Schlink et al [25] for overview of various other modeling options). We are exploring the utility of some of these modeling approaches in the current dataset in our parallel ongoing research.

Conclusion
Initial large measurement efforts are unavoidable when characterizing air quality and evaluating various exposure assessment options. However, once a considerable amount of information has been obtained about a defined area and a particular contaminant, subsequent air quality surveys can be less costly and extensive if they utilize either regression BLUP (M3) or generate an empirical prior in regression BLUP to be followed by Bayesian exposure assessment that integrates prior knowledge with a limited series of new measurements (M5). On theoretical grounds, we prefer Bayesian approach M5 because it forces investigators to make weaker assumption about the distribution of true exposure and shows good performance in our situation. However, it places extra demands on both data collection and modeling efforts and, despite its theoretical advantage, failed to outperform the more straightforward BLUP method in our study. Whether the priors based on dispersion or distance-decay models prove to be useful remains to be determined, but our findings are not encouraging. It is likely that either collecting some measurements from the desired locations and time periods (M1) or predictions of a reasonable empirical mixed effects model perhaps (M3) is sufficient in most applications. Furthermore, the simplicity of M3 relative to M5, without obvious gains in accuracy, would probably make M3 the pragmatic choice in many settings. The method to be used in any specific investigation will depend on how much uncertainty can be tolerated in exposure assessment and how closely available data matches circumstances for which estimates/predictions are required.
σ j 2 ω | rest ~D(δ + n 1 , …, δ + n k ) We make use of 'moves' to update parameters: 1. updating X using (z, θ z , U) for corresponding to the individuals 2. updating the weight (ω, z, θ) conditional on k 3. updating the parameter k and consequently the relevant mixture parameters The moves for updating the mixture parameters and changing k, the number of components by using reversible jump split/merge proposals, have been described in detail in Richardson and Green [26].