Relative performance of different exposure modeling approaches for sulfur dioxide concentrations in the air in rural western Canada
- Igor Burstyn^{1}Email author,
- Nicola M Cherry^{1},
- Yutaka Yasui^{2} and
- Hyang-Mi Kim^{1, 3}
https://doi.org/10.1186/1471-2288-8-43
© Burstyn et al; licensee BioMed Central Ltd. 2008
Received: 10 March 2008
Accepted: 04 July 2008
Published: 04 July 2008
Abstract
Background
The main objective of this paper is to compare different methods for predicting the levels of SO_{2} air pollution in oil and gas producing area of rural western Canada. Month-long average air quality measurements were collected over a two-year period (2001–2002) at multiple locations, with some side-by-side measurements, and repeated time-series at selected locations.
Methods
We explored how accurately location-specific mean concentrations of SO_{2} can be predicted for 2002 at 666 locations with multiple measurements. Means of repeated measurements on the 666 locations in 2002 were used as the alloyed gold standard (AGS). First, we considered two approaches: one that uses one measurement from each location of interest; and the other that uses context data on proximity of monitoring sites to putative sources of emission in 2002. Second, we imagined that all of the previous year's (2001's) data were also available to exposure assessors: 9,464 measurements and their context (month, proximity to sources). Exposure prediction approaches we explored with the 2001 data included regression modeling using either mixed or fixed effects models. Third, we used Bayesian methods to combine single measurements from locations in 2002 (not used to calculate AGS) with different priors.
Results
The regression method that included both fixed and random effects for prediction (Best Linear Unbiased Predictor) had the best agreement with the AGS (Pearson correlation 0.77) and the smallest mean squared error (MSE: 0.03). The second best method in terms of correlation with AGS (0.74) and MSE (0.09) was the Bayesian method that uses normal mixture prior derived from predictions of the 2001 mixed effects applied in the 2002 context.
Conclusion
It is likely that either collecting some measurements from the desired locations and time periods or predictions of a reasonable empirical mixed effects model perhaps is sufficient in most epidemiological applications. The method to be used in any specific investigation will depend on how much uncertainty can be tolerated in exposure assessment and how closely available data matches circumstances for which estimates/predictions are required.
Background
It is well established that errors in exposure estimation can bias the results of epidemiological investigations. This takes most commonly the form of attenuation of the exposure-response association such that there is a danger of a false negative conclusion [1, 2]. In addition, non-differential exposure misclassification can lead to reduced widths of confidence intervals of risk estimates, potentially leading to false positive results [1]. In some circumstances, differential misclassification of exposure can also produce positive bias in exposure-response relations, leading to false positive findings [3]. The implications of both false negative and false positive results of epidemiological studies can be profound. Specifically, in the first case, important causes of disease could be missed and, as a consequence, preventable disease may remain unchecked. In the second case, harm could be caused by implementation of inappropriate prevention measures and policies, and by creating unnecessary anxiety in the community.
In statistical literature, exposure misclassification and miss-measurement are known as a measurement error problem and a plethora of approaches exist to correct for biases that arise from it under certain assumptions [4, 5]. One obvious approach to the problem is to obtain more precise exposure estimates instead of correcting for a known or suspected extent of exposure miss-measurement. In this regard, advances in monitoring technology have been helpful, such as passive monitoring that reduces the cost of measuring exposures, thereby obtaining larger volumes of relevant data that yield more accurate exposure estimates [6–9]. In the current project, passive monitoring technology was used to collect large quantities of air quality measurements over a vast geographical area.
In parallel, developments in exposure modeling/prediction methodologies are also valuable, such as group-based [10, 11] and (statistical) model-based based exposure assessment [12], even though they are only recently starting to 'connect' with the mainstream literature on measurement error. Although the ecological fallacy may arise in epidemiological studies that utilize this approach, this does not diminish the utility of group-based exposure assessment in which all members of a group are assigned the same exposure status that reflects average exposure in the area/group. The ecological fallacy can be avoided by collecting information on confounders at the individual level. This approach to exposure misclassification is still under active development and there are ongoing arguments as to whether it is possible to infer individual exposure from either micro- (e.g. in persons' living room) or macro-environment (e.g. central air monitoring station for a town) measurements [13].
One of the exposure modeling approaches that, at least conceptually, holds great promise incorporates knowledge from empirical (statistical) and theoretical (physical) exposure assessment approaches in the Bayesian framework [14]. It has been suggested that, in occupational exposure assessment, a more accurate estimate of exposure can be obtained by combining pre-existing information about exposure status (e.g. schematics of workplaces, knowledge of chemicals used and transformed in a workplace, historical measurements from related operations, opinions of occupational hygienists) with exposure measurements [14]. This idea was critiqued [15] emphasizing that informative priors cannot be obtained in most occupational studies due to the lack of validated physical exposure models. However, the suggested approach may hold more promise in applications where informative priors can be obtained, as in modeling of air quality in relation to industrial emissions into the general environment or from routinely collected data on air quality, to provide some notion of the shapes of probability distributions of exposure in a given location.
Area measurement of air pollutants is often used as a proxy of exposure in epidemiological studies and for the purpose of this paper the two terms will be used interchangeably. The main objective of this work was to determine how we can best use currently available information on air concentrations of SO_{2} in rural western Canada to predict location-specific average exposure in a manner that is both cost-effective and accurate. We explore a prediction problem in a different time period at the fixed monitoring sites where some relevant data on sources and past air quality data may be available.
Methods
Data
Alloyed gold standard (AGS)
In order to evaluate the performance of different exposure modeling approaches, we need to know the true value of the location-specific mean exposure at each location in 2002. However, we only have observed time-series with repeated observations at each location and therefore can only estimate these values. Consequently, we were only able to assess the performance of different exposure modeling approaches in relation to our best estimate of the true value. This approach that does not adjust for measurement error and yet is free from any model assumptions is a location-specific arithmetic average, a direct measure of latent quantity of interest. We computed this at locations where there were repeated measurements in 2002 and designated it as M0*. Measurements that were imagined to have been collected in 2002 (d2) were the location-stratified random subset of all 2002 measurements; they were not used in calculation of the alloyed gold standard.
Overview of prediction methods considered
One measurement from each location that was not used to calculate AGS was assumed to have been observed in 2002. We considered approaches that uses one month-long average measurement from each location of interest in 2002 (M1); and the other – context data on proximity of monitoring sites to putative sources of emission in 2002 (M2). In addition, we imagined that all of the previous year's (2001's) data were also available to exposure assessors. Exposure prediction approaches we explored with the 2001 measurement data included regression modeling using either mixed (M3) or fixed effects models (M3f). Lastly, we used Bayesian methods to combine single measurements from locations in 2002 (M1) with different priors (M4-M6). These approaches described within two separate scenarios below: without any measurements from 2002 (M2, M3, M3f) and with one measurement per location of interest in 2002 (M1, M4-M6).
The first scenario: no measurements in 2002
If we choose not to collect any measurements in 2002 and rely on the 2001 data to make 2002 exposure predictions, we may consider two options. First, we could construct a model of the determinants of exposure using only 2001 data (d1 and c1). We will assume that it will have the same functional form as a model built previously [16]. We can then use fixed effect estimates of that model to estimate exposures in 2002 using context c2 for 2002 (method M3f) or use both fixed and random terms of the model to estimate exposures in 2002 using the 2002 distance to sources, context c2, to obtain Best Linear Unbiased Predictors, (BLUP) (M3).
The following model of the determinants of exposure could be constructed using the 2001 data (d1 and c1 only):
ln(SO_{2}, ppb) = -0.97+0.26ln [Σ_{allΔ2}(Δ_{2} oil wells)^{-2/3}]+0.24ln [Σ_{allΔ2-50}(Δ_{2-50} oil wells)^{-2/3}] +12.33ln [Σ_{allΔ2}(Δ_{2} gas plants)^{-2/3}]+4.15ln [Σ_{allΔ2-50}(Δ_{2-50} gas plants)^{-2/3}]+random effects, (1)
where Δ_{2} = distance in km from the monitoring location to a specified oil and gas infrastructure (oil wells or gas plants in this case) within the 2 km radius of the monitoring station (industrial infrastructure outside of this radius was ignored in the calculation of Δ_{2}); Δ_{2-50} = distance in km from the monitoring location to a specified oil and gas infrastructure within the 2–50 km torus; and random effects with the estimate of between-location variance (s^{2} _{L1}) 0.23, the estimate of month-to-month variance (s^{2} _{T1}) 0.09, and the estimate of between-repeat (within month and location) variance (s^{2} _{R1}) 0.21. This model is very similar in terms of the magnitude of fixed and random effects to the model that was previously derived in the basis of the entire data available to us [16]. The rationale for formulating distance to sources as in equation (1) is described in greater detail below.
Alternatively, we could be skeptical about the value of 2001 data and models that they yield, and rely exclusively on the description of measurement sites in 2002 in terms of their proximity to oil and gas infrastructure (i.e. c2) to rank locations in terms of expected SO_{2} concentrations (M2). Several such rankings are possible, because we do not know a priori which context (i.e. proximity to what type(s) of facilities) is best to use. Concentrations near point sources of emission in flat terrain without strong prevailing winds can be described as being directly proportional to the emission rate and inversely proportional to the separation distance taken to the power of 2/3, a distance decay model [17]. This informed the parameterization of predictive models we developed in M3, and appears to be a reasonable starting point for ranking different monitoring sites with respect to anticipated air quality. However, there is uncertainty about which distance to which oil and gas facilities is the most sensible to use in predicting SO_{2} concentrations. On one hand, strong sources of SO_{2} emissions, such as gas plants, seem obvious candidates, but they are less numerous and farther away from monitoring locations than wells and batteries. Thus, all these facilities can potentially impact SO_{2} concentration and the context of 2002 measurement sites (c2) was described in terms of proximity to all wells, all batteries and all gas plants. The proximity measure was described in detail previously [16]: it is a sum of (distance in km)^{-2/3} for each facility type within 2 km or 50 km radius around each monitoring site. The coordinates of different active oil and gas facilities in 2002 were supplied to us by the regulatory agencies from the Canadian provinces of Alberta, Saskatchewan and British Columbia, enabling us to estimate the distances. Proximity to the following facilities was estimated: wells with 2 km, wells within 50 km, batteries within 2 km, batteries within 50 km, gas plants within 2 km, and gas plants within 50 km.
All regression models and their predictions in the manuscript were made in SAS (version 9.1, SAS Institute, Cary, NC) PROC MIXED using the REML algorithm.
The second scenario: some measurements in 2002
If one measurement was collected in 2002 from each location of interested on a randomly chosen month (d2), we can consider the following exposure estimation options. A simple approach is to use a single measurement from each location in 2002 to estimate mean location-specific exposures in 2002 (M1).
We could also dismiss the 2001 data except for estimating measurement error variance using repeated measurements and then 'correct' 2002 measurements for this measurement error under the assumption of log-normal distribution of true exposure levels (M4). We can also use estimates from M3 as a basis for an empirical normal mixture prior with an unknown number of components for observed data d_{2} to obtain method M5. Alternatively, we could mistrust 2001 measurements and rely only on the context of 2002 measurements (c2) for the prior information, leading us to method M6, which also utilizes normal mixture prior with an unknown number of components.
Bayesian approaches have been adopted for adjusting bias arising from measurement error [5]. Parameters of a Bayesian model are not assumed to be fixed, but vary at random in accordance with some probability distributions. For each parameter (or a set of parameters), a probability distribution that reflects its prior knowledge/belief is specified and combined with the likelihood function of the data to obtain a posterior distribution of the parameter(s) (e.g., location-specific means of SO_{2} concentration in our case). This posterior distribution includes all knowledge/belief related to the parameters from the prior and the likelihood involving covariates (i.e. data and assumed models). It is usually obtained by means of the Monte Carlo integration using Markov Chain (MCMC) unless it is analytically tractable. The variables observed with error are also considered to be random, so that they are incorporated into the process of sampling from the posterior.
Bayesian analysis has been developed to adjust for measurement error by specifying two sub-models: i) a measurement error model relating the observed exposure with error and the true exposure; and ii) the prior distribution of the true exposure. The true exposure is assumed to have either a lognormal distribution for a specific known prior (M4), or a mixture of normal distributions with unknown number of components, a flexible approach aimed to overcome potential misspecification of the prior distribution (M5 and M6). The reversible jump algorithm [18] is used for the normal mixture prior with unknown number of components, together with the standard Gibbs or Metropolis algorithm. The details of the Bayesian models and their implementation are given in the Appendix.
In implementing M4 (in R: Copyright 2005, The R Foundation for Statistical Computing Version 2.1.1 (2005-06-20), ISBN 3-900051-07-0), we obtained an MCMC chain with 45,000 iterations and discarded the first 15,000 'burn-in' interactions. In implementing M5 and M6 (in FORTRAN), we used 100,000 'burn-in' iterations and used the subsequent 100,000 iterations to obtain estimates of posterior for each location.
Measures of relative performance
Comparing estimated exposures to M0* (the arithmetic mean used as the AGS) will enable us to evaluate relative performance of different exposure assessment methods. In environmental epidemiology, the association of interest may be that between the concentrations of a contaminant (ppb SO_{2} in our case) and risk of a disease. The most commonly-used exposure-disease model is the logistic regression model. Because the relationship between true (ϕ_{T}) and observed (ϕ_{O}) risk gradients in logistic regression is determined by Pearson correlation between true and observed exposure (ρ_{TO}) as in ϕ_{T} = ϕ_{O}/ρ^{2} _{TO} [1], and a correlation between two random variables can be estimated without fully specifying their distributions, we use the Pearson correlation between the SO_{2} levels predicted by the different exposure estimation procedures and the alloyed gold standard (M0*) as a measure of relative performance of the different procedures. We also computed mean squared error (MSE): mean of (estimate – AGS)^{2}.
Results
The alloyed gold standard could only be calculated for the 666 sites that had repeated air quality measurements (out of total of 903 sites) in 2002. The average number of repeated measurements per location was six, ranging from two to 24.
Comparison to alloyed gold standard constructed as a mean of observed measurements from a given location in 2002 when there were at least two measurements (2 to 24; average = 6, N = 666).
Exposure Assessment Method for annual mean in 2002 | ρ_{TO} ^{a} | MSE^{b} | ||
---|---|---|---|---|
Type of method/model | Model description^{ Nomenclature } | Use of measurements^{c} | ||
No model | one measurement per location^{ M1 } | 2002 | 0.67 | 0.15 |
Distance-decay | contextual data only^{ M2 } | None | 0.21 | 0.28 |
Regression | Effects used in prediction | |||
fixed & random, BLUP^{ M3 } | 2001 | 0.77 | 0.03 | |
fixed effects^{ M3f } | 2001 | 0.33 | 0.12 | |
Bayesian | Prior | |||
lognormal^{ M4 } | 2001 & 2002 | 0.68 | 0.15 | |
normal mixture from regression model M3^{ M5 } | 2001 & 2002 | 0.74 | 0.09 | |
normal mixture from context, M2^{ M6 } | 2002 | 0.28 | 0.30 |
Discussion
Strictly speaking, our observations only apply to the particular data set from which they were derived and a specific sample of 2002 observations. However, the results suggest some general conclusions about the estimation of environmental concentrations of pollutants derived from industrial sources. When no measurements of air quality are available, we can expect predictions by a simple distance-decay model to have poor agreement with true air quality (M2). When only a relatively small measurement effort is possible in the time-period of interest and the magnitude of measurement error is known from some validation studies, the empirical Bayesian methodology that relies only on 2002 data (d2) and some estimate of measurement error (M4) produced results that were not markedly different from just using one measurement per location to estimate true location-specific concentrations (M1). However, if a very poor prior (M2) is combined with a limited set of exposure measurements (M1), even if these measurements are close to 'true' values, the Bayesian methodology leads to inferior estimates of true values (M6). The poor prior appears to degrade advantages present in the data.
When only exposure measurements collected from adjacent time and places of interest are available, we can expect to obtain reasonable estimates if we rely on the empirical BLUP of the mixed effects models (M3), not just predictions based on estimates of fixed effects (M3f). The Bayesian normal mixture method with flexible prior also seems to have a reasonable performance (M5), especially if one considers pitfalls inherent in the alternative approaches. Namely, M3 will perform poorly if there is a large change in air quality between 2001 and 2002, but M5 would utilize 2002 data preferentially and be less affected by this. However, as suggested by results with 'poor' prior (M6), when there is a large difference in exposure between the data sets used to model exposure and true exposure being predicted, the Bayesian normal mixture method is expected to falter relative to the simple collection of relevant data. This echoes a previous suggestion that, in many situations, the effort involved in modeling exposures may exceed that required to collect measurements [20].
Methods M1 and M4 had virtually identical agreements with the alloyed gold standard, which was inferior to methods M3 and M5. We can ascribe poor performance of M1 to failure to account for measurement error, since it uses only one observation per location in 2002, and ignoring the context of 2002 exposures. In Bayesian method with lognormal prior that uses 2001 data only to define measurement error variance (M4), inferior predictions can be ascribed to improper prior specification, an extreme case of poor prior also illustrated by M6, as well as ignoring the context of 2002 exposures. This suggests that methods that fail to correct for measurement error and/or are based on poor priors can be expected to yield predictions of inferior accuracy.
The main limitation of our study is the lack of a gold standard to evaluate the performance of different exposure assessment procedures. We are inclined to believe that our choice of gold standard that is free from model assumptions is indicative of true performance of the compared methodologies. In this way, comparison is not biased in favor of a method that may be employed to produce an alloyed gold standard adjusted for measurement error. Thus, although our chosen alloyed gold standard is contaminated by measurement error, it was obtained without resorting to the assumptions that are used in the competing exposure assessment methods.
We had the luxury of a large 2001 dataset that enabled us to create an empirical prior that probably closely reflects the distribution of true values and the extent of measurement error. It may not be possible to rely on such pre-existing data in many studies. Given the sensitivity of the Bayesian methods to 'quality' of the prior, careful judgment is required in deciding whether it is better to invest resources into extensive data collection or complex modeling. It must be noted that our 2001 data did not cover every month (data collection began in April) whereas 2002 measurements were spread across all months in 2002. This presented a realistic challenge to our exposure assessment models of estimating exposures for temporally misaligned data in presence of temporal trends in exposures within a year (see Figure 4 in [16]).
Our data was not very variable and contained only a modest measurement error. Thus, our conclusions may not hold for more variable and more error-prone situations that may arise in environmental exposure assessment, as reported for volatile organic compounds [21, 22].
Another limitation of our work presented here is that we were not able to explore all possible modeling techniques that may be potentially available for predicting air pollution levels. It is for this reason that we focused on methods that appear to be sensible "first choices" in the given setting plus some more exotic Bayesian model that we wished to evaluate. Specifically, an autoregressive integrated moving average (ARIMA) approach may be suitable for part of our data where spatially aligned time-series can be identified as may be a more flexible methodology of Calder et al[23, 24]. In addition, it may be possible to obtain better predictions through the empirical regression models by relaxing assumptions based on the model of Strosher[17], by either modeling the power transformation, employing generalized additive models, or using neural networks that relax parametric assumptions about the shape of distance-concentration association (see the Schlink et al[25] for overview of various other modeling options). We are exploring the utility of some of these modeling approaches in the current dataset in our parallel ongoing research.
Conclusion
Initial large measurement efforts are unavoidable when characterizing air quality and evaluating various exposure assessment options. However, once a considerable amount of information has been obtained about a defined area and a particular contaminant, subsequent air quality surveys can be less costly and extensive if they utilize either regression BLUP (M3) or generate an empirical prior in regression BLUP to be followed by Bayesian exposure assessment that integrates prior knowledge with a limited series of new measurements (M5). On theoretical grounds, we prefer Bayesian approach M5 because it forces investigators to make weaker assumption about the distribution of true exposure and shows good performance in our situation. However, it places extra demands on both data collection and modeling efforts and, despite its theoretical advantage, failed to outperform the more straightforward BLUP method in our study. Whether the priors based on dispersion or distance-decay models prove to be useful remains to be determined, but our findings are not encouraging. It is likely that either collecting some measurements from the desired locations and time periods (M1) or predictions of a reasonable empirical mixed effects model perhaps (M3) is sufficient in most applications. Furthermore, the simplicity of M3 relative to M5, without obvious gains in accuracy, would probably make M3 the pragmatic choice in many settings. The method to be used in any specific investigation will depend on how much uncertainty can be tolerated in exposure assessment and how closely available data matches circumstances for which estimates/predictions are required.
Appendix: Details of Bayesian methods M4, M5 and M6
True exposure X is observed with error as U. The goal of the methods presented below is to estimate X on the basis of U using information and assumptions about the nature of the measurement error.
In applying the method M4, we specify the two sub-models:
p(U _{ i }| X _{ i }, λ) : measurement error model
p(X _{ i }|π): prior (true exposure) model for X _{ i }
and the joint distribution of X _{ i }and U _{ i }is $p\left(\lambda \right)p\left(\pi \right){\displaystyle \prod _{i}p\left({X}_{i}|\pi \right)}{\displaystyle \prod _{i}p\left({U}_{i}|{X}_{i},\lambda \right)}$, where p(λ) and p(π) are the prior distributions for the parameters of the two sub-models, and p(• | •) to denote generic conditional distributions consistent with the joint specification.
The measurement error model for U _{ i }conditional on X _{ i }is given by log (U _{ i }) ~N(log(X _{ i }), τ ^{2}), where λ = τ ^{2} is known and the prior for a lognormal distribution is given by X _{ i }~log N(μ, σ ^{2}), where π = (μ, σ ^{2}). The parameters μ and σ ^{2} are assumed to have a normal distribution with mean 0 and a variance s ^{2} (sample variance) and a highly dispersed inverse gamma distribution with parameters 1 and 0.005, respectively. We derive full conditionals for the parameters as follows:
We use a Metropolis-Hastings algorithm with a random walk proposal to first update X _{ i }and then μ and σ ^{2} in each step. Initial values of σ ^{2} come from the logarithmic variance of the distribution of 2002 measurements (d2) and τ ^{2} is the variance between repeats of 2001 data, s^{2} _{R1} (see above).
where f (· | θ) is a normal distribution. The unknown number of k components with parameters θ _{ j }= (μ _{ j }, σ ^{2} _{ j }) and the components weights ω _{ j }summing up to 1 are unknown.
The hierarchical formulation of this mixture model introduces latent allocation variable z _{ i }that indicates to which mixture component the observation X _{ i }belongs. This model can be formulated by:
p(z _{ i }= j) = ω _{ j }independently for j = 1, 2, …, k and given the value of the z _{ i }, ${X}_{i}|z~f\left(\cdot |{\theta}_{{z}_{i}}\right)$ independently for i = 1,2, …, n.
We use the same notation for the conditional distributions, and $\omega ={\left({\omega}_{j}\right)}_{j=1}^{k}$, $z={\left({z}_{i}\right)}_{i=1}^{n}$, $\theta ={\left({\theta}_{j}\right)}_{j=1}^{k}$, $X={\left({X}_{i}\right)}_{i=1}^{n}$ and $U={\left({U}_{i}\right)}_{i=1}^{n}$. The joint distribution is given by
p(k, ω, z, θ, τ ^{2}, X, U) = p(k)p(ω | k)p(θ |z, ω, k)p(z | ω, k)p(X |θ, z, ω, k)p(U | X, τ ^{2}), which is equivalent to p(k, ω, z, θ, τ ^{2}, X, U) = p(k)p(ω | k)p(z | ω, k)p(θ | k)p(X |θ, z)p(U | X, τ ^{2}) by imposing independence assumptions, p(θ |z, ω, k) = p(θ | k) and p(θ | z, ω, k) = p(X |θ, z).
ω | rest ~D(δ + n _{1}, …, δ + n _{ k })
- 1.
updating X using (z, θ _{ z }, U) for corresponding to the individuals
- 2.
updating the weight (ω, z, θ) conditional on k
- 3.
updating the parameter k and consequently the relevant mixture parameters
The moves for updating the mixture parameters and changing k, the number of components by using reversible jump split/merge proposals, have been described in detail in Richardson and Green [26].
Declarations
Acknowledgements
This research was supported by an Establishment Grant from The Alberta Heritage Foundation for Medical Research of Dr. Igor Burstyn. Drs. Igor Burstyn and Yutaka Yasui are supported by salary awards from the Canadian Institutes for Health Research and Canada Research Chair program, respectively, and both the Alberta Heritage Foundation for Medical Research. Data used in the study arose from research contract from Western Inter-Provincial Scientific Studies Association, [27] which oversaw sampling design, collection of measurements and their laboratory analysis. Without their involvement, this study would not have been possible.
Authors’ Affiliations
References
- Armstrong BG: Effect of measurement error on epidemiological studies of environmental and occupational exposures. Occup Environ Med. 1998, 55 (10): 651-656.View ArticlePubMedPubMed CentralGoogle Scholar
- Jurek AM, Greenland S, Maldonado G, Church TR: Proper interpretation of non-differential misclassification effects: expectations vs observations. Int J Epidemiol. 2005, 34: 680-687. 10.1093/ije/dyi060.View ArticlePubMedGoogle Scholar
- Brenner H: Inferences on the potential effects of presumed nondifferential exposure misclassification. Ann Epidemiol. 1993, 3: 289-294.View ArticlePubMedGoogle Scholar
- Carroll RJ, Ruppert D, Stefanski LA: Measurement error in nonlinear models. 1995, London, England, Chapman and Hall Ltd.View ArticleGoogle Scholar
- Gustafson P: Measurement Error and Misclassification in Statistics and Epidemiology. 2003, Chapman & Hall/CRC PressView ArticleGoogle Scholar
- Tang H, Brassard B, Brassard R, Peake E: A new passive sampling system for monitoring SO2 in the atmosphere. Fieled Analytic Chemistry and Technology. 1997, 1: 5-307.Google Scholar
- Tang H, Sandeluk J, Lin L, Lown JW: A new all-season passive sampling system for monitoring H2S in air. ScientificWorldJournal. 2002, 2: 155-168. 10.1100/tsw.2002.87.View ArticlePubMedGoogle Scholar
- Liljelind IE, Rappaport SM, Levin JO, Stromback AE, Sunesson AL, Jarvholm BG: Comparison of self-assessment and expert assessment of occupational exposure to chemicals. Scand J Work Environ Health. 2001, 27: 311-317.View ArticlePubMedGoogle Scholar
- Kromhout H, Loomis D, Mihlan GJ, Peipins LA, Kleckner RC, Iriye R, Savitz D: Assessment and grouping of occupational magnetic field exposure in five electric utility companies. Scand J Work Environ Health. 1995, 21 (1): 43-50.View ArticlePubMedGoogle Scholar
- Tielemans E, Kupper LL, Kromhout H, Heederik D, Houba R: Individual-based and group-based occupational exposure assessment: Some equations to evaluate different strategies. Ann Occup Hyg. 1998, 42 (2): 115-119.View ArticlePubMedGoogle Scholar
- Kim HM, Yasui Y, Burstyn I: Attenuation in risk estimates in logistic and Cox proportional-hazards models due to group-based exposure assessment strategy. Ann Occup Hyg. 2006, 50: 623-635. 10.1093/annhyg/mel021.View ArticlePubMedGoogle Scholar
- Wameling A, Schaper M, Kunert J, Blaszkewicz M, van Thriel C, Zupanic M, Seeber A: Individual toluene exposure in rotary printing: Increasing accuracy of estimation by linear models based on protocols of daily activity and other measures. Biometrics. 2000, 56: 1218-1221. 10.1111/j.0006-341X.2000.01218.x.View ArticlePubMedGoogle Scholar
- Kromhout H, van Tongeren M: How important is personal exposure assessment in the epidemiology of air pollutants?. Occup Environ Med. 2003, 60: 143-144. 10.1136/oem.60.2.143-a.View ArticlePubMedPubMed CentralGoogle Scholar
- Ramachandran G, Vincent JH: A Bayesian approach to retrospective exposure assessment. Appl Occup Environ Hyg. 1999, 14 (8): 547-557. 10.1080/104732299302549.View ArticlePubMedGoogle Scholar
- Burstyn I, Kromhout H: A critique of Bayesian methods for retrospective exposure assessment. Letter to the editor (and reply). Ann Occup Hyg. 2002, 46 (4): 429-432. 10.1093/annhyg/mef058.View ArticlePubMedGoogle Scholar
- Burstyn I, Senthilselvan A, Kim HM, Pietroniro E, Waldner CL, Cherry NM: Industrial sources influence air concentrations of hydrogen sulfide and sulfur dioxide in rural areas of western Canada. J Air Waste Manag Assoc. 2007, 57: 1241-1250.View ArticlePubMedGoogle Scholar
- Strosher MT: Investigations of flare gas emissions in Alberta. Final Report to: Environment Canada Conservation and Protection, the Alberta Energy and Utilities Board and the Canadian Association of Petroleum Producers. 1996, Calgary, AB, Alberta Research CouncilGoogle Scholar
- Green PJ: Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika. 1995, 82: 711-732. 10.1093/biomet/82.4.711.View ArticleGoogle Scholar
- Scott HM, Soskolne CL, Wayne MS, Ellehoj EA, Coppock RW, Guidotti TL, Lissemore KD: Comparison of two atmospheric-dispersion models to assess farm-site exposure to sour-gas processing-plant emissions. Prev Vet Med. 2003, 57: 15-34. 10.1016/S0167-5877(02)00207-6.View ArticlePubMedGoogle Scholar
- Burstyn I, Heederik D, Bartlett K, Doekes G, Houba R, Teschke K, Kennedy S: Wheat antigen content of inhalable dust in bakeries: Modeling and inter-study comparison. Appl Occup Environ Hyg. 1999, 14 (11): 791-798. 10.1080/104732299302224.View ArticlePubMedGoogle Scholar
- Burstyn I, You XI, Cherry NM, Senthilselvan A: Determinants of airborne benzene concentrations in rural areas of western Canada. Atmospheric Environment. 2007, 41: 7778-7787. 10.1016/j.atmosenv.2007.06.011.View ArticleGoogle Scholar
- Rappaport SM, Kupper LL: Variability of environmental exposures to volatile organic compounds. J Expo Anal Environ Epidemiol. 2004, 14: 92-107. 10.1038/sj.jea.7500309.View ArticlePubMedGoogle Scholar
- Calder CA, Holloman C, Higdon D: Exploring space-time structure in ozone concentration using a dynamic process convolution model. Case Studies in Bayesian Statistics, Volume 6. 2002, New York, Springer_Verlag, 165-176.View ArticleGoogle Scholar
- Calder CA: A dynamic process convolution approach to modeling ambient particulate matter concentrations. Environmetrics. 2008, 19: 39-48. 10.1002/env.852.View ArticleGoogle Scholar
- Schlink U, Dorling S, Pelikan E, Nunnari G, Cawley G, Junninen H, Greig A, Foxall R, Eben K, Chatterton T, Vondracek J, Richter M, Dostal M, Bertucco L, Kolehmainen M, Doyle M: A rigorous inter-comparison of ground-level ozone predictions. Atmospheric Environment. 2003, 37: 3237-3253. 10.1016/S1352-2310(03)00330-3.View ArticleGoogle Scholar
- Richardson S, Green PJ: On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society B. 1997, 59: 731-792. 10.1111/1467-9868.00095.View ArticleGoogle Scholar
- WISSA: Western Inter-Provincial Scientific Studies Association. 2008, [http://www.wissa.info]Google Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/8/43/prepub
Pre-publication history
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.