Between April 2005 and June 2006, we conducted a case-control study of risk factors for Campylobacter enteritis among individuals aged 18 years and above in five Health Protection Units (HPU) in England. The details of the study are extensively described elsewhere [9]. Laboratory-confirmed cases of Campylobacter enteritis reported within each HPU were sent a letter from the local Consultant in Communicable Disease Control (CCDC) inviting them to participate in the study, together with a consent form, a 12-page, self-administered risk factor questionnaire, and a pre-paid, addressed return envelope. The questionnaire enquired about health details (presence of diabetes and chronic gastrointestinal illness, and use of acid-suppressing medications), exposure to animals in the home, workplace or elsewhere, recreational exposure to water sources, and a detailed history of normal dietary habits as well as consumption of chicken, and untreated dairy and water in the five days prior to illness onset. No reminders were sent to cases, as a pilot study indicated that there was little benefit in doing so.
Controls were randomly sampled from lists of individuals registered with general practice clinics in the five HPUs. Based on previous years' distribution of reported cases, five times as many controls as expected cases were sampled in each HPU, frequency matched on age group, sex and month of report. Potential controls were approached with an initial mailing pack similar to that used for cases. Individuals who had not responded within two weeks were sent a reminder letter. A second reminder and another copy of the questionnaire were sent to those who had still not responded after three weeks. Controls were asked for the same risk factor information as cases, but for recent risk factors we sought information about exposure in the five days prior to questionnaire completion.
The study received a favorable ethical opinion from the North West Multicentre Research Ethics Committee. Approval was obtained from Local Research Management and Governance departments serving each study site.
Overall participation was 46.5% (n = 2381) among cases and 37.3% (n = 5256) among controls. In the original study, we excluded individuals reporting irritable bowel syndrome (cases = 221, 9.3%; controls = 324, 6.2%), because of difficulties ascertaining date of onset and because risk factors in this group may differ. We additionally excluded controls reporting gastrointestinal symptoms in the previous 14 days (n = 431, 8.2%), and cases and controls reporting foreign travel in the 14 days prior to illness onset or questionnaire completion, respectively (cases = 560, 23.5%; controls = 511, 9.7%). Finally, we excluded two cases and seven controls because we could not determine whether they were aged 18 years or above, and a further six cases that occurred in the same household as a previously identified case. After exclusions, 1592 and 3983 cases and controls were available for analysis. In the final multivariable model, self-reported, past Campylobacter enteritis, use of acid-suppressing medications, recent acquisition of a pet dog, and consumption of chicken prepared outside the home were identified as risk factors for Campylobacter enteritis.
In our previous analysis [9], the potential for bias due to non-participation was assessed using inverse probability weighting, where the weights were inversely proportional to the probability of participation and derived from a two-level logistic model regressing participation against study site, a three-way interaction between age group, sex and case/control status, and area of residence as a latent, random intercept variable capturing area-level deprivation. The analysis indicated that weighting made little difference to the effect estimates for risk factors identified in the final multivariable model.
For the present analysis, we categorized controls as follows: (1) individuals who returned a completed questionnaire and were included in the anaysis (included controls); (2) individuals who returned a completed questionnaire and were subsequently excluded from analysis because of the above-mentioned reasons (excluded controls); (3) individuals who declined to participate (active refusers); (4) inviduals sent a questionnaire but whose address details were subsequently found to be incorrect or invalid (incorrect addresses); and (5) individuals from whom no response was obtained after three reminders (passive refusers). Included controls (group 1) were further categorized as (A) controls who completed or returned a questionnaire within two weeks of the initial contact; (B) controls who completed or returned a questionnaire after being sent the first reminder, but before a second reminder was sent out; and (C) controls who returned a questionnaire after being sent a second reminder.
We compared controls in groups 1 to 5 with respect to the distribution of age group, sex and area-level deprivation. We obtained the latter by linking individuals' postcodes of residence to Super Output Areas (SOAs), geographical boundaries comprising approximately 1000 residents for which aggregated census data are available. SOAs are ranked according to a standard Index of Multiple Deprivation (IMD) [10], which captures geographic variation in deprivation, using a range of education, employment, health, crime, housing and environment indicators. Individuals were assigned to a quintile of deprivation based on their SOA of residence. The distributions of these variables between the five groups were tabulated. For a small fraction of individuals, HPUs were unable to provide information on age (n = 344, 2.3%), sex (n = 66, 0.4%), or postcode (133, n = 0.9%). These individuals were excluded from analysis of the relevant variable, but included in other comparisons for which data were available.
For included controls (group 1), we investigated the effect of each wave of reminders on mitigating participation bias by estimating the effect of individual risk factors on case status for those returning a questionnaire before the first reminder (group 1A), those returning a questionnaire before the second reminder (1A+1B) and all included controls (1A+1B+1C). We used unconditional logistic regression adjusting for the stratifying variables of age group, sex, study site and month. For each risk factor, we calculated the absolute difference in the effect estimate, δ, as the difference in the regression coefficient between group 1A and all controls, and groups 1A+1B and all controls:
where β
i,all
represents the logarithm of the odds ratio for risk factor i using all controls, and β
i,j
is the logarithm of the OR for risk factor i using controls j (j = 1A, 1A+1B). For each mailing wave, we determined the proportion of variables yielding Wald test p-values < 0.2, according to the conventional practice of selecting such variables for further analysis in a stepwise regression.
Even in the absence of systematic error, differences in the coefficients occur due to random error. The extent of this error is dependent on the prevalence of the risk factor, as for a given sample size random error increases with decreasing prevalence. To assess whether bias might have occurred that exceeded that expected from random error, we plotted absolute bias against prevalence for each risk factor, by mailing wave.
In addition, we investigated the effect on the final multivariable model of using only initial respondents, and participants responding before a second reminder, as compared with the analysis using all controls.
Analysis was performed using Stata 10 (Stata Corporation, Texas) and Microsoft Excel 2007 (Microsoft Corporation, Washington) software.