Setting and population
The study was conducted within the borders of the AHRI DSS in rural KwaZulu-Natal. The DSS covers an open cohort of over 100,000 people in a 438km2 region near Mtubatuba, South Africa. The primary ethnicity/language of the region is Zulu. The region is mostly rural and semi-urban and among the poorest in South Africa, with an estimated HIV prevalence of 29% [18, 19]. Participants were recruited from those participating in the annual DSS individual surveillance survey data collection round, consisting of all residents aged 15 and over and living within the borders of the DSS. In addition to demographic, socioeconomic, and health related questionnaires, the AHRI DSS performs annual surveillance HIV tests, the results of which were not disclosed to participants in 2016. Individuals in AHRI DSS are also linked to the Department of Health HIV treatment clinics in the area. Individuals who have records related to HIV treatment through this system have voluntarily entered the HIV related clinical setting and have received HIV related services. We therefore assume that these individuals know that they are HIV positive.
Sample generation protocol
The study sample generation and recruitment procedure was designed to test the validity of the list randomization using known truth, rather than to create generalizable population estimates, in the ACDIS surveillance population. Participants are selected into this study through three levels. The first level contains the subset of 30,828 adult (18+) individuals who had participated in the ongoing 2016 AHRI DSS surveillance round from January 19, 2016 to September 1, 2016. The second level selects 8000 random individuals from that dataset, oversampling individuals with known HIV status and testing behavior from the HIV testing module in the 2016 round of ACDIS, resulting in our target sample. The final selection is based on a geographically diverse selection of 500 individuals across the ACDIS geographic area.
The sampling procedure was designed in conjunction with AHRI field work teams to balance maximizing the sample size of individuals for whom truth is recorded and linkable, diversity of the population from which the sample is drawn, and efficient recruitment of individuals to our study. The size of both the target sample (8000) and the study sample (500) were determined through a combination of simulation of expected responses and experience from the AHRI fieldwork teams to most efficiently allocate fieldwork resources with respect to study goals. While it would have been theoretically feasible to start with a stratified list of 500 randomly selected individuals, field experience suggested that this would result in a resource intensive procedure, as it would require contacting individuals, scheduling visits, and inefficient travel for field workers. Instead, the protocol outlined below allowed field workers to maximize the number of visits per day, ensure geographic diversity, and maintaining internal validity through arm randomization, at the cost of generalizability.
Target sample generation
We used a stratified-random selection scheme to generate the 8000-person target sample from those 30,828 individuals meeting our inclusion criteria. The individuals in the 2016 ACDIS dataset were divided into four strata, as below:
-
1)
Those who tested HIV positive in surveillance and were linked to HIV treatment system (i.e. those who are positive and know their HIV status)
-
2)
Those who tested HIV positive in surveillance and were NOT linked to HIV treatment system (i.e. those who are positive and may or may not know their HIV status)
-
3)
Those who tested HIV negative in surveillance
-
4)
Those who did not test (i.e. those who refused the surveillance HIV test).
2000 individuals were randomly selected from each of the above four categories of individuals, yielding 8000 individuals in the target sample, 25% from each of the above categories.
Study sample generation
The fieldwork team was given a list of the 8000 individuals in the target sample with their names, field identifier, sex, and approximate locations, but no other information. A fieldwork coordinator was instructed and trained to manage the collection of 500 surveys from across the ACDIS region. The coordinator assigned each fieldworker a daily list of assigned individuals to approach, generally by geographic sub-region, with fieldworkers approaching a different set of individuals each day. Daily assignments were designed and adjusted to both maximize geographic diversity of the sample within the ACDIS borders when the pre-determined stopping rule of 500 surveys collected was reached. Fieldworkers were instructed to attempt to visit a given individual only once, skipping individuals if they were not available and/or refused. Individuals were entered into the study sample if the individual was available, consent was given at the time of the household visit, and their survey data were successfully recorded and transferred to the secure data server.
Randomization to experiment arms
The 8000 people in the target sample were randomized to one of two arms before being approached for recruitment into the study: Arm A (60%) and Arm B (40%) as described below. Randomization was stratified by the above four categories for the target sample generation. The arm to which participants had been randomized was not known to field workers until consent was obtained and electronic data capture had begun.
Arm a
In this arm, participants were given five blocks, each block with five true/false subquestions. One subquestion in each block was the sensitive item, and the remaining four were non-sensitive questions. The first block, corresponding to the marginally sensitive question “Did you brush your teeth today” was used as a tutorial question. As suggested in T Tsuchiya, et al. [20] and T Nepusz, et al. [21], participants were asked to count on their fingers behind their backs as individual items within each block were asked. When the list of five sub-questions was finished, participants were asked to reveal their fingers to the surveyor. Five total blocks were given, one for each sensitive question. Figure 1 shows a sample list block question, including the training question and instructions.
The non-sensitive questions were selected and designed in conjunction with community representatives to be culturally relevant, easy to understand and answer, and be unlikely to be correlated with the sensitive question of interest. Independence of non-sensitive and sensitive questions allows both for fewer statistical modelling assumptions when estimating LR-based regressions and is a required condition for the design effects estimator, as discussed below. Using non-sensitive items which are topically irrelevant to the sensitive question improves the plausibility of this independence assumption [7].
It is plausible that having sensitive questions which are topically different than the non-sensitive questions may induce additional cognitive effects by calling attention to the sensitive question. To test this hypothesis, we randomize the position in each block in which the sensitive question appears (i.e. first through fifth item within each block). If the degree to which sensitive questions stand out changes responses, we might similarly expect that the ordering of those questions would impact responses, assuming that the ordering also impacts the degree to which sensitive questions stand out.
One particularly challenging aspect in the design of LR surveys are ceiling/floor effects which occur when participants’ count in a given block approaches extremes (in this case 0 or 5 true/affirmative). In cases where subjects give all affirmative or all negative answers to non-sensitive questions, the actual answers to the sensitive question can be easily inferred. Given this, non-sensitive items should be chosen such that most subjects have one to three (out of four) affirmative answers across the non-sensitive items. Using assumed probabilities of affirmative answers for each question generated by discussion with community representatives, simulations were performed with random assignment of non-sensitive questions to blocks, and the assignment that had the lowest simulated probability of producing extreme responses (0 or 5) was selected. Estimated probabilities used for this simulation are shown in Additional file 1 alongside the full list of questions and associated blocks.
Arm B
Arm B serves two main purposes: estimating of the true/false counts for the non-sensitive questions, and estimating the true/false counts for the sensitive question. Arm B asks all questions, both sensitive and non-sensitive questions, directly. Each of the non-sensitive questions in Arm B is a component in one of the blocks from Arm A, allowing counts to be generated for the non-sensitive questions. Asking the sensitive questions directly allows comparison of the LR-estimated percentages compared to standard, directly-asked questionnaires. These questions are asked after the non-sensitive questions to ensure that they do not influence the non-sensitive answers.
This format differs from many other list randomization studies, which typically have a secondary arm identical to the first but without the sensitive question. The standard design helps ensure that if there are cognitive biases due to the counting procedure, biases would be roughly equal in both arms. However, unlike other list experiments, this study is interested in the exact percentages of non-sensitive items in this population to inform the construction of future list experiments in this population, designing questions which avoid ceiling and floor effects. Further, this study design allows for the potential use of alternative estimators, such as that proposed in D Corstange [22], which take advantage of individually asked questions to potentially improve efficiency of multivariate regression models in list experiments.
Sensitive items of interest
The five sensitive questions of interest are listed below. The first question below is used as a training question, and as such does not contain truly sensitive information.
-
I brushed my teeth today. (+)
-
I used a condom during my last sexual encounter. (+)
-
I am HIV negative. (+)
-
I have had anal sex within the last 12 months. (−)
-
I refused the AHRI DSS HIV test this year. (−)
We expected a positive social desirability bias for the first three questions, and a negative social desirability bias associated with the latter two questions, as indicated by the +/− signs above. We define an improvement in inference in this paper as when the LR estimate yields estimates for which at least one of the following is true: LR estimates lower percentages estimated for items with an expected positive social desirability bias, LR estimates higher percentages estimated for items with an expected negative social desirability bias, or LR estimates are closer to the actual value when known, as compared with direct questionnaire estimates.
Survey procedures
Fieldworkers approached target individuals at their homes, proceeding with the survey only if signed consent was obtained. Participants signed consent via electronic tablet, and were given a physical copy of the consent form and survey information. The survey and the electronic signature consent were administered by electronic tablets using REDCap™ software for data capture. All instructions, questions, and consent were given in Zulu, as translated by local native Zulu speakers in the community engagement team at AHRI. Fieldwork was considered complete when field workers reported 500 surveys completed. Both versions of the survey keep track of time to response in order to assess cost of implementation in future surveys.
Human subjects and IRB approval
The protocol for this survey was approved by the University of KwaZulu-Natal BioMedical Research Ethics Committee (BF291/16) and by the Harvard University Institutional Review Board (IRB16–0864).
Statistical analysis
The main outcome of interest is the estimated prevalence for key HIV-related outcomes. We use two estimators of the prevalence of the sensitive items in our surveys: list randomization and direct questionnaire. To estimate the prevalence of a sensitive item using list randomization, we use a variation of the difference in means approach [7], which utilizes information from both Arm A and Arm B. For a given block of questions, we have two variations in how the block was asked. For Arm A, which contains the sensitive question, we simply take the mean of the counts of affirmative answers for each block. For Arm B, we take the sum of affirmative responses for each person corresponding to the four non-sensitive questions in the block, and take the mean of these counts. The direct questionnaire estimate uses only the direct questions about the sensitive item from Arm B alone.
The main multivariate regression is performed using the linear regression methodology from G Blair, et al. [23] to estimate correlations between sensitive question answers as elicited by the LR method and both known truth and demographic information. This will help assess the degree to which answers are at least correlated with known truths by including actual HIV status/refusal as a covariate. In the case of HIV status, actual status variables cannot be included in the population with HIV status known to the participant, as all of these individuals are HIV+. The linear model is chosen for computational robustness. While alternative models, such as the K Imai (2011) [8] MLE estimator, may provide improved statistical efficiency, these models may introduce computational difficulties, and are treated as tertiary experimental methods in this analysis.
Finally, we test for the possible presence of design effects using G Blair, et al. [7]‘s proposed design effects estimator, which attempts to detect the presence of individuals giving different answers to the sensitive question due to the design of and/or results of the non-sensitive questions. Ceiling and floor effects, for example, are scenarios in which extreme counts of responses (i.e. all true/affirmative or all false/negative) could reveal the respondents’ response to the sensitive question. G Blair, et al. [7]‘s design effects test attempts to detect the existence of these effects by exploiting differences in the expected probability of positive/negative responses at each count level of non-sensitive questions. Higher percentages of expected negative probabilities at different count levels may indicate design effects, especially those related to non-sensitive question counts. However, this test would not necessarily detect other biases caused by the design of the survey and is of limited statistical power given our sample size.
Actual known status percentages for HIV status and test refusal are estimated for all subpopulations. For comparability with the LR estimates, these percentages are treated as estimates of the percentages from the underlying population and data generation process with associated standard errors/confidence intervals. We estimate the full sample percentage of HIV positive individuals by assuming that those in our sample who refused the HIV surveillance test had the same percentage HIV positive/negative as the ACDIS general population in 2016. Applying this percentage to our sampled “refuse” population yields an estimated percentage positive/negative for our full sample. We use the width of the confidence intervals from the non-refused sample population as a conservative (i.e. too wide) estimate of the confidence intervals around this population, under the conservative assumption that the application of the assumed general population adds no precision to our population estimate.
Point estimates and estimated standard errors for all statistics are taken only within the context of this study with respect to its sampling structure, and therefore estimates were not adjusted for sampling weights and stratification. Unless otherwise noted, all results are from the study sample.
All methods presented here, unless otherwise noted, are implemented in R, with all LR-specific calculations using the “list” package [23].