Estimation methods with ordered exposure subject to measurement error and missingness in semi-ecological design

Background In epidemiological studies, it is often not possible to measure accurately exposures of participants even if their response variable can be measured without error. When there are several groups of subjects, occupational epidemiologists employ group-based strategy (GBS) for exposure assessment to reduce bias due to measurement errors: individuals of a group/job within study sample are assigned commonly to the sample mean of exposure measurements from their group in evaluating the effect of exposure on the response. Therefore, exposure is estimated on an ecological level while health outcomes are ascertained for each subject. Such study design leads to negligible bias in risk estimates when group means are estimated from ‘large’ samples. However, in many cases, only a small number of observations are available to estimate the group means, and this causes bias in the observed exposure-disease association. Also, the analysis in a semi-ecological design may involve exposure data with the majority missing and the rest observed with measurement errors and complete response data collected with ascertainment. Methods In workplaces groups/jobs are naturally ordered and this could be incorporated in estimation procedure by constrained estimation methods together with the expectation and maximization (EM) algorithms for regression models having measurement error and missing values. Four methods were compared by a simulation study: naive complete-case analysis, GBS, the constrained GBS (CGBS), and the constrained expectation and maximization (CEM). We illustrated the methods in the analysis of decline in lung function due to exposures to carbon black. Results Naive and GBS approaches were shown to be inadequate when the number of exposure measurements is too small to accurately estimate group means. The CEM method appears to be best among them when within each exposure group at least a ’moderate’ number of individuals have their exposures observed with error. However, compared with CEM, CGBS is easier to implement and has more desirable bias-reducing properties in the presence of substantial proportions of missing exposure data. Conclusion The CGBS approach could be useful for estimating exposure-disease association in semi-ecological studies when the true group means are ordered and the number of measured exposures in each group is small. These findings have important implication for cost-effective design of semi-ecological studies because they enable investigators to more reliably estimate exposure-disease associations with smaller exposure measurement campaign than with the analytical methods that were historically employed.


Supplementary Material
EM with Measurement Errors Only:
Then, the solution to this maximization problem can be found and updated as follows: µ (t+1) = isotonic regression of (m 1 ,m 2 , · · · ,m G ) with weight vector (n 1 , n 2 , · · · , n G ) , If we keep updating estimates by this EM algorithm, then θ (t) will converge the true MLE of θ . Note that no Monte Carlo method is necessary for the simple linear case.

2) Logistic regression
Parameters in the logistic regression model are θ 1 = (β 0 , β 1 ), θ 2 = σ 2 η and θ 3 = (µ, σ 2 b ). As in the simple linear case, σ 2 η is assumed to be known. The E step for this model gives In fact, the third term of Q(θ |θ (t) ) is constant because σ 2 η is known. Since the conditional density of X gi given Y gi = y gi and W gi = w gi is not have closed form of expressions. Thus, a Monte-Carlo EM method is used as is generally the case in many similar situations. The outline of the M-Step in the (t + 1)st iteration of the EM algorithm can be described as follows: Step 1: Set µ (t+1) equal to the isotonic regression of (m 1 , · · · ,m G ) with weight vector (n 1 , · · · , n G ) , wherē Step 2: Keeping θ (t) in the conditional distribution, apply a usual Newton method to maximize Q(θ |θ (t) ) with respect to β until a convergence criterion is satisfied. And set β (t+1) equal to the solution.
It should be noted that the Newton method in Step 2 can be applied simply to the second term in Q(θ |θ (t) ) because all other conditional expectations do not involve β .
As mentioned earlier, the conditional expectations here do not have closed form of expressions, and thus we rely on a Monte Carlo method to evaluate them.