Appendix to: Using Interviewer Random Effects to Remove Selection Bias from HIV Prevalence Estimates

∗Corresponding author. Address: 9 Bow Street, Cambridge, MA 02138, USA. †Harvard Center for Population and Development Studies and Department of Global Health and Population, Harvard T.H. Chan School of Public Health. Email: mcgovern@hsph.harvard.edu. ‡Department of Global Health and Population, Harvard T.H. Chan School of Public Health, and Wellcome Trust Africa Centre for Health and Population Studies, South Africa. Email: tbaernig@hsph.harvard.edu. §Department of Global Health and Population, Harvard T.H. Chan School of Public Health. Email: jsalomon@hsph.harvard.edu. ¶Harvard Center for Population and Development Studies and Department of Global Health and Population, Harvard T.H. Chan School of Public Health. Email: dcanning@hsph.harvard.edu.


1
The equation for the HIV status of individual i with interviewer j is: where h * ij is again a latent variable, which can be thought of as reflecting propensity to be infected, and ij is an error term. We assume (u ij , ij ) are jointly distributed as bivariate normal, each with mean zero, variance 1, and correlation parameter ρ = corr(u ij , ij ).
The inclusion of the interviewer effects, z j , which are assumed to affect consent to testing, but not HIV status, is crucial to the model. Without variables in the selection equation that are excluded from the HIV status equation, the model is only identified by non-linearities, and does not provide robust estimates [4].
Even with a suitable exclusion restriction, there are a number of technical problems associated with estimating the model. The first concerns the interviewer effects in z j , which typically take the form of a series of binary indicator variables, essentially an interviewer fixed effect [3]. Estimation of these interviewer fixed effects may be difficult because, for interviewers who have only successes or failures in obtaining consent, the parameter is not identified. For example, if an interviewer always obtains consent, we know that the coefficient on the indicator variable for that interviewer in the consent equation is large and positive. However, the coefficient is not identified, as there is no variation in consent for that interviewer. The coefficient for that interviewer could be arbitrarily large, as any very large positive interviewer effect above some threshold will perfectly predict success, and therefore we cannot distinguish between different effect sizes for interviewers who obtain 100% consent. In addition, we control for the region and language of the interviewee. However, in the DHS interviewers usually work in a small number of regions, and an effort is often made to match them with interviewees by language. This may create collinearity between the interviewer effect and the region and language indicator variables in the equation, making it difficult or even impossible to estimate the interviewer effects. The approach adopted by Bärnighausen and colleagues was to limit estimation of individual interviewer effects to those interviewers with more than 50 interviews, and not to use the interviewer effects of those interviewers who had conducted more than 50 interviews when including them lead to identification problems [3]. 1 Instead, interviewers with less than 50 interviewers, or interviewers whose inclusion caused identification problems, were grouped together as a baseline group with an assumed common interviewer effect.
A conceptual difficulty is that grouping all of the interviewers whose individual interviewer effects are not identified into a baseline group may not be a convincing strategy, because we could be pooling interviewers who always have consent with those who never achieve consent, and assuming that they have the same average effectiveness in obtaining consent. Thus, we do not exploit the fact that the HIV tests made by interviewers who always obtain consent are the most informative since they do not suffer from selection bias. As we discuss in the main text, two other limitations with the fixed effect approach are potential small sample bias, and failure of the bootstrap method for calculating standard errors.

A1.2 Selection Model Using Random Effects
Our approach to these problems is to assume that the unobserved interviewer effects reflect some underlying parametric distribution which describes interviewer effectiveness. If interviewers were randomly assigned to survey households we could simply assume that each interviewer was a random draw from the pool of interviewers. However, systematic matching of interviewers to subjects, particularly by region and language, can mean that there may be a correlation between interviewer success rates and who they are matched with. Therefore, we write the interviewer effect as [5,6]: Wherex j represents the average characteristics of the n j people that interviewer j interviews. Excluding these controls could potentially invalidate the exclusion restriction of the model. In order to be asymptotically unbiased and consistent, a requirement of random effects models is that there is no correlation between the random effects, v j , and the explanatory variables at the individual level, x ij [5,6]. In order for this to hold, we must assume that, controlling for these observable averages, there is no remaining correlation between the error term in equation (A3) and the individual level variables in the model. In particular, we assume that interviewers are not systematically assigned to groups of survey participants based on unobservable characteristics, but only on the observable characteristics measured by the survey. As thesex j likely depend only on survey design, they should not enter the HIV status equation. 2 This assumption of interviewer random effects gives us the selection equation: In principle, we could estimate the system given by equations (A2) and (A4) by maximum likelihood. However, this is difficult as we have a selection equation which has a random effect, requiring numerical integration, inside a bivariate probit model.
There is, however, a simple consistent estimator. Dubin and Rivers show that the bivariate probit model with selection can be estimated by first finding a consistent estimate of the parameters of the selection model, ignoring the covariance of the error terms, and then estimating the parameters of the full model by maximum likelihood, holding the selection equation parameters at their first stage estimates [2]. The procedure is as follows. We first estimate the interviewer effects from the selection equation only (stage one), then include these constructed parameters as our exclusion restriction in a Heckman-type selection model (stage two). This two-stage approach is consistent, though not fully efficient, and the second stage does not produce the correct analytic standard errors since the interviewer effects are estimated in the first stage, but treated as exogenous variables in the second stage. Murphy and Topel discuss the consequences for confidence intervals in models which use data that are estimated from a first stage [7].
We can implement this approach by first estimating the interviewer effect as shown in equation (A3). Equation (A4), the selection equation, is then run using the predicted interviewer effect (z j = x j δ+v j ) as the exclusion restriction. Assuming random effects avoids the estimation problems of the fixed effects approach; because we are able to estimate an interviewer effect for all interviewers we can implement a bootstrap procedure to adjust our standard errors for the fact that the relationship between consent and HIV status is uncertain and needs to be estimated. The assumption that the interviewer effects are normally distributed random effects aroundx j δ also means we have a smaller set of parameters to estimate than in the fixed effects model.
Conceptually, we could estimate the interviewer random effects using a multilevel regression for consent, with level one being the individual and level two being the interviewers. However, a simpler method of constructing consistent estimates of the interviewer random effects is to implement a probit model and compute the predicted random effect,v j , as the average of the error term for each interviewer. 3 We use this probit model to compute the estimated interviewer effect,ẑ j =x jδ +v j , whereδ is the vector of estimated coefficients on the interviewer averages from equation (A3), andv j is the predicted random effect from equation (A3). 4 Having obtained the estimatesẑ j , we can then estimate the full bivariate probit model in equations (A1) and (A2) using this estimated interviewer effect as the selection variable that affects consent but does not appear in the HIV status equation.
Since the first stage is a consistent estimator of the interviewer effect, the two-stage procedure will be consistent. We can address the problem of incorrect standard errors by bootstrapping over the whole two-stage procedure.

A1.3 Bias Correction
While the random effects approach solves the identification problems associated with the interviewer parameters and allows bootstrapping, a second problem remains. In small samples, particularly when consent to test is very rare (or very common), or the HIV rate is very low (or very high), it is difficult to estimate the correlation between consent to test and HIV status [8], and the maximum likelihood estimates in the context of bivariate probit models can be biased [9]. Also, the model may fail to converge, or may produce a result of ρ = ±1, on the boundary of the possible parameter space, in which case the assumptions required for asymptotic normality of the maximum likelihood 3 Estimating this first stage as a random effects model is computationally intensive and produces almost identical results to the simple probit. The simple probit produces consistent estimates and this is all this is required for the second stage. 4 Our results are robust to just usingvj alone as the interviewer effect rather than the full estimatex jδ +vj estimator are violated, and standard inference, including bootstrapping, is invalid [10,11]. Such a result also has the implication that, in terms of predicted probabilities, everyone who fails to test is either HIV positive with certainty (ρ = +1), or HIV negative with certainty (ρ = −1), which seems implausible. In general, maximum likelihood often does not have desirable finite sample properties [12,13], as the estimate is the most likely value (in terms of posterior probability), which gives zero weight to values with lower posterior probabilities, even when those probabilities are positive.
We wish to construct an estimate of ρ that is consistent and also corrects for this small sample bias. Take a data set x * and a parameter vector θ * . The likelihood of θ * is simply the probability of the data given these parameters: In the usual maximum likelihood framework, we are not generally concerned with the likelihood of the observed data given the parameters, we are more interested in the probability of the parameters given the observed data. However, by Bayes rule: Where P (θ * ) is our prior probability distribution on the parameters. If we have a flat prior probability distribution over the interval [-1, 1], so that P (θ * ) is the same for every θ * , we have P (θ * |x * ) ∝ P (x * |θ * ) = L(θ * ), and the maximum likelihood estimate of θ * is also the estimate that has the highest posterior probability given the data. While this estimate is the most likely parameter value given the data, we can construct an alternative estimator as: If the prior probability distribution is flat, we have P (θ * |x * ) ∝ L(θ * ), the posterior probability is proportional to the likelihood, and we can write our estimator as: Where k is a normalization factor so that the integral of the likelihood over the parameter space is one, and θ * kL(θ * ) can be interpreted as an approximation to a probability density function.
The standard maximum likelihood approach chooses the most likely value of θ * , while our approach gives us an average value of θ * , where we average over different models weighted by the probability of the model being correct. This gives consistent estimates (the likelihood function asymptotically puts zero weight on incorrect parameters) and is an unbiased estimator by construction under the assumption that the prior probability distribution is correct. This approach is implemented by calculating the likelihood for each value of ρ, and then taking the weighted average of ρ, where the weights are the likelihood values (transformed so that these values integrate to 1).
In principle, we could construct these estimates for all the parameters in the model. However, in practice the maximum likelihood estimates for most of the parameters in the model are well determined, 5 and it is only the correlation parameter ρ that poses problems. We therefore use a profile (or concentrated) likelihood for ρ (see Appendix 3 for details). The profile likelihood can be used to substitute for the full likelihood when there are other parameters in the model which are not of direct interest [14]. We have verified that estimates for these other parameters are very stable across all models. Therefore, once ρ is estimated we find the maximum likelihood values of the other parameters given this value of ρ.
We report the maximum likelihood estimates and our new bias correction estimates together with bootstrapped standard errors and confidence intervals. Our cluster bootstrapping takes account of the stratification and cluster sampling procedure of the survey design [15]. All our HIV prevalence estimates are weighted.

A2 Additional Results for Model Parameters
We begin by estimating the interviewer random effects as the average error term (for each interviewer) from a probit model for consent to test for HIV, where we include a standard set of covariates along with the mean of these variables for each interviewer to capture the effects of nonrandom allocation of interviewers to participants as shown in equation (A4). Descriptive statistics for interviewers in Zambia and Ghana are shown in Table A1. Table A2 lists the variables used as predictors in the bivariate probit model, and their source in the data. The Stata code for preparing the data is publicly available from http://hdl.handle.net/1902.1/17657 [3,16]. The distribution of the number of interviewees by interviewer and their consent rates is shown in figures A1-A4. Table A3 presents results from the probit model for consent to test for HIV in Zambia. This table is a single regression for HIV consent where the first column shows the marginal effects for the individual level variables (x ij ), while the corresponding marginal effects for the interviewer averages (x j ) are shown in the second column. 6 At the individual level, the following variables are associated with consent to test: education, location, prior sexually transmitted disease (STDs), age at first intercourse, number of partners, willingness to care for a HIV positive relative, knowing an AIDS victim, and being a smoker. For example, an extra year of education is associated with an increase in the probability of consenting to an HIV test by roughly 0.5 percentage points. Apart from years of education, which is the mean years of education for that interviewer's interviewees, all interviewer averages are measured as the proportion of interviewees in that category. Many of the interviewer average measures are significant in predicting consent. For example, in table A3 if the interviewee speaks Lozi the probability of consent to testing increases, but the association is not statistically significant. However, interviewers who conducted more interviews with Lozi speakers had higher consent rates (indicated by the positive and statistically significant coefficient for the proportion of interviews with Lozi speakers in table A3). These interviewers are likely to be Lozi speakers themselves, and these results therefore suggest that the Lozi speaking interviewers may have been better than average at obtaining consent. Table A4 presents marginal effects for the Heckman-type selection model for consent and HIV status for Zambia. The first two columns give results for the maximum likelihood fixed effects approach. The middle two columns give results for our maximum likelihood random effects approach, and the final two columns give the results for the random effects bias correction model. For this third model, we estimate the likelihood on a grid of values of ρ (we use values between −1 and +1 at intervals of 0.01). We than calculate the likelihood of the model and posterior probability of each value of ρ, and then find the expected value of ρ based on this probability. 7 The coefficients reported in table A4 are calculated with this value of ρ imposed. Coefficient estimates across the three models are very similar.
Chiburis and colleagues recommend the use of the bootstrap for inference in the context of recursive bivariate probit models to correct for poor coverage of analytic standard errors [9]. We find that the bootstrap confidence interval for the random effects model is almost 10 times as wide as the analytic standard errors for the fixed effects model. We use 1,000 iterations to calculate these standard errors, and the corresponding 95% bootstrap confidence interval is calculated using the appropriate centiles from the empirical distribution of the bootstrap estimates. This approach is more appropriate than normal-based approximations when the distribution of the parameter of interest is skewed.
In table A5, we present the estimated correlation coefficient for Zambia from the different models.
The negative values estimated indicate that those who refuse to test are more likely to be HIVpositive. The maximum likelihood random effects approach yields a ρ of around −0.50, which is somewhat lower than the −0.75 obtained from the fixed effects estimator. The random effects bias corrected estimate is slightly smaller again at −0.44. Table A6 presents estimates of HIV prevalence among those who refused to consent to an HIV test, again comparing the fixed effect, random effect and random effects bias corrected methods. As with the correlation coefficient, the estimates from the random effects model are lower than from the fixed effects model (32% v 52%). The corresponding estimate from the imputation model is 12%. Table A7 presents marginal effects for the consent model in Ghana, and tables A8-A10 give the parameter results of the three Heckman-type selection models.

A3 The concentrated (profile) likelihood function
The likelihood of the parameters (β s , β h , ρ), given the full data set (y, x, z) is: For a given ρ, we can concentrate the likelihood function by setting the other parameters at their maximum likelihood values given ρ: In large samples, the approximation to P (y, x, z|ρ) will become exact as the maximum likelihood estimates of the other parameters are consistent. Using the concentrated maximum likelihood, the problem is reduced to a one parameter model and we can carry out our small sample correction with a prior probability distribution over ρ alone.     A3). The average error term for each interviewer is added to the predicted value of the interviewer means, and used as the exclusion restriction in the HIV regression. The final model is the random effects bias correction procedure using the same exclusion restriction. Coefficients are obtained by restricting the value of ρ to its random effects bias correction estimate and implementing the bivariate probit at that value. Marginal effects (absolute change in the probability of a positive outcome) are shown. Source: DHS Zambia 2007 (men). Standard errors account for clustering at the PSU level.     A3). The average error term for each interviewer is added to the predicted value of the interviewer means, and used as the exclusion restriction in the HIV regression. The final model is the random bias correction procedure using the same exclusion restriction. Coefficients are obtained by restricting the value of ρ to its random effects bias correction estimate and implementing the bivariate probit at that value. Marginal effects (absolute change in the probability of a positive outcome) are shown. Source: DHS Ghana 2003 (men). Standard errors account for clustering at the PSU level.      Figure A2: Graph is at the interviewer level (one observation per interviewer). For each interviewer, their consent rate is calculated as the number of respondents from whom consent to test for HIV was obtained by the interviewer, divided by the number of respondents from whom consent to test for HIV was sought by the interviewer.  Figure A4: Graph is at the interviewer level (one observation per interviewer). For each interviewer, their consent rate is calculated as the number of respondents from whom consent to test for HIV was obtained by the interviewer, divided by the number of respondents from whom consent to test for HIV was sought by the interviewer.