Differential response effects of data collection mode in a cancer screening study of unmarried women ages 40–75 years: A randomized trial

Background Little is known about the impact of data collection method on self-reported cancer screening behaviours, particularly among hard-to-reach populations. The purpose of this study is to examine the effects of data collection mode on response to indicators of cancer screenings by unmarried middle-aged and older women. Methods Three survey methods were evaluated for collecting data about mammography and Papanicolaou (hereafter, Pap) testing among heterosexual and sexual minority (e.g., lesbian and bisexual) women. Women ages 40–75 were recruited from June 2003 – June 2005 in Rhode Island. They were randomly assigned to receive: Self-Administered Mailed Questionnaire [SAMQ; N = 202], Computer-Assisted Telephone Interview [CATI; N = 200], or Computer-Assisted Self-Interview [CASI; N = 197]. Logistic regression models were computed to assess survey mode differences for 13 self-reported items related to cancer screenings, adjusting for age, education, income, race, marital status, partner gender, and recruitment source. Results Compared to women assigned to CATI, women assigned to SAMQ were less likely to report two or more years between most recent mammograms (CATI = 23.2% vs. SAMQ = 17.7%; AOR = 0.5, 95% CI = 0.3 – 0.8) and women assigned to CASI were slightly less likely to report being overdue for mammography (CATI = 16.5% vs. CASI = 11.8%; AOR = 0.5, 95% CI = 0.3 – 1.0) and Pap testing (CATI = 14.9% vs. CASI = 10.0%; AOR = 0.5, 95% CI = 0.2 – 1.0). There were no other consistent mode effects. Conclusion Among participants in this sample, mode of data collection had little effect on the reporting of mammography and Pap testing behaviours. Other measures such as efficiency and cost-effectiveness of the mode should also be considered when determining the most appropriate form of data collection for use in monitoring indicators of cancer detection and control.


Background
Over 18 million women aged 40-75 in the United States are currently unmarried [1]. The designation of "unmarried" refers to women who are legally separated, divorced, widowed, or never legally married. Sexual minorities (e.g., lesbians and bisexuals) are an important segment of the unmarried female population. Sexual minority women may be living in committed relationships comparable to married heterosexual couples but are unable to legally marry in any state except Massachusetts. Although few studies include sufficient samples of unmarried women for analysis, data suggest that the risk for breast and cervical cancer may be higher for subgroups of unmarried women than women in general [2][3][4][5]. Therefore, the burden of disease and risk of adverse health outcomes may be greater for unmarried women if detection is delayed or forgone.
Determining effective modes of obtaining sensitive personal information is one important aspect to improving rates of cancer detection and control among unmarried women. Prior studies have documented that the methods used to elicit information can influence both individuals' willingness to disclose personal information and the quality of data that are obtained. The advantages of anonymity in self-administered questionnaires (SAQ) relative to telephone and face-to-face interviews have been demonstrated [6][7][8]. However, the use of paper-and-pencil SAQ is disadvantageous in terms of: lower unit response rates, higher levels of missing data, higher numbers of inconsistent or illogical responses across related questions, and limitations on questionnaire complexity such as skip patterns [9]. Conversely, significant advantages in data quality and flexibility of questionnaire format with telephone and face-to-face interviews have been well-documented, including quality of recorded answers, control of response order, and use of complicated skip patterns [9,10].
Computer assisted self-interviewing (CASI) has advantages in data quality and questionnaire design flexibility comparable to telephone and face-to-face interviews [11][12][13]. However, studies comparing CASI to other modes of data collection for reporting sensitive behaviours have shown varying results. Some investigators have found mixed [12] or limited main effect differences [13][14][15] for CASI versus face-to-face interviews and SAQ. Others have found that CASI may be as good as, or even better than, face-to-face interviews at fostering a sense of privacy and increasing the willingness of respondents to report sensitive information [11,[16][17][18][19].
Respondent age, general trust in others, attitudes about privacy and confidentiality, and attitudes towards computers may influence reactions towards CASI [15,20,21]. These issues may be particularly relevant for women 40-75 years who are age-eligible for breast and cervical cancer screening, and who may have less experience with computers than younger women. In addition, sexual minority women may be particularly concerned about privacy and confidentiality.
There is limited information about the effect of interview mode on the willingness of unmarried middle-aged and older women to reveal personal information about cancer-related attitudes and practices. Without specific information about differential effects of data collection mode, it is impossible to determine the extent to which women are under-represented in surveillance and interventions because they are less likely to participate and/or are afraid to acknowledge potentially sensitive information. In addition, we must have methods that provide optimally valid and reliable self-reported behavioural and attitudinal data. Previous studies have found that women self-report cancer screenings at rates higher than indicated in clinical records [22][23][24][25][26]. These issues are particularly important as researchers and health care organizations seek valid, costeffective forms of data collection for interventions to improve quality of care indicators. Therefore, the objectives of this study were to: 1. Describe the utility and feasibility of different modes for collecting data from middle-aged and older unmarried women; 2. Examine the effects of randomized interview mode on responses to indicators of mammography and Pap test screening; and 3. Determine whether the effects of interview mode on responses to indicators of mammography and Pap test screening differ by partner gender.

Methods
The study design included a three-step process: recruitment, allocation to eligibility strata, and randomization to data collection mode.

Sample and recruitment
Women were eligible if they were legally unmarried, were aged 40-75 years, currently received the majority of their health care in Rhode Island, and had never been diagnosed with cancer other than non-melanoma skin cancer. Women with a previous diagnosis of cancer were excluded because the overall focus of the study was on cancer screening behaviours, and the experiences for survivors have been shown to be different than for women who have never had a cancer diagnosis [27,28].
We used principles of targeted and respondent driven sampling [29] to recruit and enroll participants. Compa-rable strategies were used to recruit heterosexual and sexual minority women. A total of 773 women were recruited and screened for eligibility over 25 months (June 1, 2003 -June 30,2005). Six general sources were used for recruitment: (a) community settings (n = 146); (b) health fairs (n = 123); (c) mailings and flyers (n = 153); (d) print media (n = 135); (e) staff and participant social networks (n = 146); and other (n = 70). For additional information about participant recruitment, see Clark et al. [30].

Allocation to eligibility strata
Upon contact with a potential participant, we administered a telephone screening protocol following informed consent. To determine eligibility and to ensure comparable marital status and sexual orientation characteristics within interview mode, women were asked their marital status, followed by the gender of a current partner or gender preference of a partner if they were not currently in a relationship. As depicted in Table 1, women were then allocated into one of six marital status/partner gender strata: (a) never married women who partner with women [WPW] or with either women or men [WPWM] (hereafter referred to as WPW); (b) previously married WPW [includes WPWM]; (c) never married women who partner with men [WPM]; (d) previously married WPM; (e) never married women with no partner preference [NPP] and (f) previously married NPP. Strata (e) and (f) included women who reported no interest in having a partner and refused to select the gender of a potential future partner. Demographic characteristics of NPP were comparable to WPM and therefore were subsequently combined with WPM for all analyses.

Randomization to data collection mode
After eligibility screening, we asked each woman for permission to be randomized to data collection mode. Each woman had an equal probability of being assigned to one of the three data collection modes: Self-Administered Mailed Questionnaire [SAMQ], Computer-Assisted Telephone Interview [CATI], and Computer-Assisted Self-Interview [CASI]. We used a systematic block randomization schedule to make mode assignments within each of the six marital status/partner gender strata to control for the long recruitment period and non-probability based sampling methods.
Women assigned to SAMQ received a 28-page bookletform questionnaire. Women assigned to CATI completed a 35-40 minute telephone interview. Women assigned to CASI chose to complete the assessment in one of two ways: (a) laptop computer provided and monitored by research staff at locations chosen by the study participant [CASI-I]; or (b) computer disk mailed to the participant's home and returned in a self-addressed postage-paid mailer [CASI-D]. Audio technology was available for the CASI-I condition but not CASI-D. The time needed for assessment by CASI was comparable to CATI. The CATI and CASI programs were designed using the Ci3 software from Sawtooth Technologies [31].
Women in the SAMQ were asked to return the questionnaire within two weeks. Up to 10 follow-up telephone reminders were made to non-responders. Similarly, up to 10 attempts were made to collect data from women in the CATI and CASI modes.
There were two alternatives for eligible women who did not provide data by the mode to which they were randomized. First, women who were randomized who did not provide data after 10 contact attempts were considered non-respondents. Non-respondents were offered the opportunity to participate by either of the two data collection modes to which they were not assigned. Second, women who did not agree to be randomized were provided the option to self-select (self-choice) the mode of data collection. The protocol for self-choice was comparable to that for random assignment. Women in the selfchoice group were included in analyses comparing those who did and did not agree to randomization. However, they were not included in analyses of the effect of interview mode on reports of cancer screening behaviours.

Indicators of cancer screenings
We included items in the survey related to mammography and Pap test screening. We provided women with a description of the screening test prior to asking items about the exam. Five variables were related to mammography screening and were coded as dichotomous (yes/no) indicators: no mammogram in past two years, ever put off/avoided the test, two or more years between most recent exams, no plan to get the exam with the next two years, and perceived difficultly with the exam because of breast shape or size. Five parallel items were related to Pap testing: no Pap test in past three years, ever put off/ avoided the test, three or more years between most recent exam, no plan to get the exam with the next three years, and perceived difficultly with the exam because of body shape or size. Consistent with current recommendations In addition to items specific to the tests, we included three variables related to cancer screenings more generally. One variable was a composite of four questions with comparable response options about reported barriers to cancer screening. Women were classified as reporting a barrier if they endorsed one or more of the following: problems taking time off work; transportation problems; healthrelated limitations; or difficulties with getting someone to care for dependents. Second, women were asked if they had ever put off or avoided cancer screenings because of embarrassment in showing their body. Finally, women were asked if they had ever changed the place for cancer screening exams because of embarrassment in showing their body.

Analysis plan
We analyzed the data using SAS, version 9.1 [35]. Our first set of analyses was conducted to examine the utility and feasibility of different modes for collecting data from middle-aged and older unmarried women. First, we compared participant characteristics by randomly assigned mode of data collection. Second, we compared characteristics for women who agreed to be randomized versus those who chose their data collection mode [self-choice]. Third, we compared women who completed the assessment in the assigned mode to: (a) women who completed the assessment in a different mode; and (b) women who did not complete the assessment. Next, we assessed the relationship between number of contacts after randomization and response rate by assigned mode of data collection.
In the second set of analyses, we specifically examined the effects of interview mode on the responses to cancer screening indicators. For these analyses, we only included women who completed the assessment in the assigned mode. We examined distributions and computed proportions for all variables by randomization group. We then used Pearson Chi-square tests to compare the proportion of women who endorsed each of the 13 variables across data collection mode. Next, we computed odds ratios with 95% confidence intervals (CI) to assess differences between the variables reported by data collection modes, adjusting for partner gender, marital status, age, education, employment, race, and recruitment source. Finally, we tested interactions between data collection mode and partner gender.

Sample composition
The numbers of women in each of the marital status-partner gender strata are shown in Table 1. A total of 630 women were enrolled in the study (Figure 1). Of these, 599 women agreed to be randomized to mode of data collection. Because of unequal sample sizes in each marital status/partner gender strata, the three randomized groups were slightly different in size (SAMQ = 202; CATI = 200; CASI = 197). Among those randomized to SAMQ and CATI, nearly all completed the questionnaire in the assigned mode (96% and 99%, respectively). Only 86.3% (n = 170) of women randomized to CASI completed the interview in the assigned mode (CASI-I = 86.7% and CASI-D = 86.0%). Reasons for not completing the assessment in the assigned mode included: unable to contact after randomization (SAMQ = 2, CASI = 3), changes in personal or family circumstances (CATI = 2, SAMQ = 1, CASI = 3), limited English competency (SAMQ = 2), and lost interest in the study (SAMQ = 3, CASI = 7). Figure 1 also shows the distribution of the 31 women who refused randomization (self-choice). Figure 2 shows the relationship between number of contacts with participants after randomization and response rates by assigned mode. More contact attempts with participants were required for CASI relative to SAMQ and CATI to achieve comparable response rates. For example, to achieve a response rate of 90%, six contact attempts on average were required for women assigned to CASI compared to two for SAMQ and one for CATI.

Participant characteristics by data collection mode
There were no differences in participant characteristics by randomly assigned mode of data collection (CATI vs. SAMQ vs. CASI; Table 2). Within the CASI condition, WPM/NPP were equally likely to choose CASI-I and CASI-D while almost 70% of WPW chose CASI-D. There was no substantial difference in choice of CASI condition for women without a college degree. However, the majority of women with a college degree selected CASI-D. The majority of women who were not employed and those who were non-white chose CASI-I, while employed women and white women chose CASI-D. Women recruited by print media, mailings/flyers, and personal networks were more likely to choose CASI-D, while those recruited at community settings, health fairs, or other settings were more likely to choose CASI-I.
Participant characteristics by status of actual participation are shown in Table 3. Among those randomized, older women and those who worked full-or part-time were more likely to complete the assessment in the assigned mode. Women without a college degree and Hispanic women were more likely to choose the self-choice condition (Total Randomized vs. Self-Choice).

Indicators of cancer screening by randomly assigned data collection mode
Overall, there were few significant differences and no definitive patterns from the analyses of the self-reported screening variables by mode of data collection (Table 4). Compared to CATI, women assigned to SAMQ were half as likely to report two or more years between most recent mammograms and to report ever changing the place they went for a cancer screening because of embarrassment showing their body to a health care provider. Women assigned to CASI were less likely to report being overdue for Pap testing (no Pap test in past three years) and were less likely to report that Pap testing was difficult due to body shape or size.
When using SAMQ as the reference (not shown in Table  4), women assigned to CASI were less likely to report difficulties with Pap tests due to body shape or size (AOR = 0.5, 95% CI = 0.3 -0.7) and less likely to report any barriers to cancer screenings (AOR = 0.5, 95% CI = 0.3 -0.9; analyses available upon request).
We tested for interactions between partner gender and mode of interview. There were only two significant interactions. WPW were less likely to report two or more years between most recent mammograms in CASI and SAMQ than CATI (CASI: AOR = 0.3, 95% CI = 0.1 -0.8; SAMQ: AOR = 0.4, 95% CI = 0.1 -0.9). In addition, WPW were less likely to report difficulty with Pap testing due to body shape or size in CASI and SAMQ than in CATI (CASI: AOR = 0.3, 95% CI = 0.1 -0.7; SAMQ: AOR = 0.6, 95% CI = 0.3 -1.2). Finally, we replicated all the analyses after removing the 23 women who refused to select a partner gender. The results were not significantly different.

Discussion
Our findings contribute preliminary evidence of the effect of interview mode on responses to indicators of cancer screening behaviours among middle-aged and older heterosexual and sexual minority women. These findings add to the body of research about methods that can be used to best identify subgroups of the population most at risk for not receiving recommended cancer screenings. Women were randomly assigned to one of three data collection methods: computer assisted telephone interview (CATI), self-administered mailed questionnaire (SAMQ) and computer-assisted self-interview (CASI). Women assigned to CASI could choose to complete the assessment during an in-person CASI (CASI-I) or by receiving the questionnaire on disk (CASI-D).
We examined the effects of randomized interview mode on responses to items associated with mammography and Pap test screening. Overall, we found few meaningful differences by mode of data collection for indicators of cancer screening. Surprisingly, among the few significant mode differences, we found that women who were interviewed by research staff (CATI) were more likely than those not interviewed (CASI, SAMQ) to have an unfavourable status on the indicators. Women in the CATI mode were more likely to report being off-schedule for recent Pap testing than women in CASI and the trend was similar, but non-significant, for mammography. The other significant findings associated with cancer screening behaviours were between CASI conditions. Because we did not randomize women into the different computerassisted methods, we cannot rule out selection bias as a threat to the validity of the findings. Furthermore, given the lower response rate in the CASI condition (Figure 1), apparently higher rates of recent screening among women in CASI may be due to the fact that those who completed the assessment were also those most knowledgeable about cancer screening recommendations. Therefore, we cannot conclude that any mode of data collection has a consistent effect on rates of reporting screening behaviours.

Response rate by Contact Attempts in the Cancer Screening Project for Women, Rhode Island, 2003-2005
There are potential reasons why we did not find consistent mode differences in our sample. First, items about cancer screenings may not be considered sensitive or associated with social rejection since questions about mammography and Pap testing are routinely asked of women 40-75 years in clinical settings. Second, many of the studies that showed differences between CASI and other modes of data collection were conducted and published in the early and late 1990s [11,[14][15][16]36,37]. At that time, CASI was a novel interview mode. The increased access to, and use of, computers may explain why we did not find more significant differences between CASI and the other data collection modes. Finally, due to the relatively high percentages of women reporting mammography and Pap testing at recommended intervals (more than 80% for both behaviours), we may not have had sufficient power to detect statistically significant differences. With a sample size of 364 for comparisons between CATI and CASI, we only had statistical power of 0.78 to detect differences in means of 0.10 or higher with a standard deviation of 0.35. Similarly, we only had statistical power of 0.80 to detect comparable differences in means of 0.10 or higher between CATI and SAMQ with a sample size of 387. However, because the percentages of endorsement were remarkably consistent across mode for several items, it is not likely that increased sample sizes would change the conclusions substantially.
Another study objective was to determine whether the effects of interview mode differed by partner gender. We found only two significant mode differences for items related to self-reported mammography and Pap test screening by partner gender. There are several potential reasons that may explain the lack of more significant findings. First, Rhode Island is one of only a few states in the United States to have non-discriminatory policies towards sexual minorities. Therefore, within the political and social context, women in Rhode Island may be more willing than women in other parts of the country to disclose potentially unfavourable information. Second, all women interested in study participation were required to answer screening questions about marital status and partner gender prior to study enrollment. Asking these screening items provided women with examples of the types of questions that would be asked in the study. Women who considered these items too personal may have declined study participation. Finally, sample size, particularly for sexual minority women, may have limited our ability to detect important mode differences.
In the CASI condition, WPW were significantly more likely to select the mailed computer disk than WPM/NPP when given a choice of completing the assessment by a laptop provided by the research team or by a disk mailed to the participant's home. WPW were also more likely to have a college education, be employed full or part-time, have higher incomes and identify as White than WPM/ NPP. Therefore, it is likely that WPW had greater access to, and experience with, computers than WPM/NPP and were able to complete the assessment independent of assistance from a research assistant with a laptop computer. Given our findings, we encourage future studies to further explore women's preferences for data collection methods and whether mode of data collection influences the responses of middle-aged and older sexual minorities.
Our findings also provide information about the feasibility of different methods for collecting data from a traditionally under-represented group of women. Of the 630 women who were eligible and enrolled in the study, 95% agreed to be randomized to one of three modes of data collection. Not surprisingly, women who were more likely to have access to a computer (e.g., more education, employed, white race) chose CASI-D. Women who refused randomization (self-choice) were more likely to have less than a college degree, to identify as Hispanic, and to choose SAMQ. Despite the informed consent process, women in the self-choice option may not have completely understood the concept of randomization and been concerned about the implications of agreeing to randomization. They may have chosen the mode that was most familiar to them, offered the most perceived anonymity, and provided the greatest degree of flexibility in completing the assessment (e.g., time and availability of assistance with question understanding). We obtained an overall response rate of 93%. This response rate is generally higher than for most other studies, particularly SAMQs, and is a strength of our study because of the low potential for non-response bias. The high response rate is likely a result of the initial contact we had with women during recruitment and screening for eligibility. Unfortunately, we do not have data to inform other studies of comparable populations that do not employ similar pre-survey contact with participants.
Despite the high overall response rate, we found noteworthy differences in response rate by mode. The response rates for CATI and SAMQ were over 95%, while only 86% for CASI. The lower response rate by computer was not unexpected given other mode experiments [19] and the age of the participants. There were likely some women with less experience using computers who, despite initially agreeing to participate, worried about their ability to correctly use the software or feared unknown potential consequences of responding to a computer program. Additionally, women may have had technical difficulties with the computer that we were unaware of because they indicated that they were no longer interested in study participation rather than acknowledging problems with computer software.
We also found that more contact attempts with participants were required for CASI relative to SAMQ and CATI to achieve comparable response rates ( Figure 2). Furthermore, the estimated costs per randomized participant were approximately $60 for CASI compared to $30 for SAMQ and $20 for CATI. Within the CASI condition, the cost per participant was about $115 for CASI-I and $20 for CASI-D. Had we used Internet-based data collection, the costs associated with CASI would have been substantially lower. However, the sample would have been biased towards women with higher socioeconomic positions who had access to a computer. Women in our sample who chose in-person CASI were more likely to identify as a racial minority, to be less educated and not employed compared to those who chose to complete the questionnaire on a disk that was mailed to them.
In addition to sample size, there are a number of other study limitations. First, to include sufficient numbers of sexual minorities, we used non-probability based sampling methods. Our sample was highly educated, predominantly white, and employed, with relatively higher incomes. Unfortunately, because sexual orientation is not asked of all individuals in the Census or on any large statewide population-based survey, we do not have data to compare our sample to the eligible Rhode Island population. Therefore, care should be taken when generalizing our findings. We also did not use methods to verify selfreported data and cannot confirm whether there was sub-stantial over-or under-reporting where differences were observed across modes. Finally, we cannot discern which mode provided the most accurate estimates of true behaviour, nor can we distinguish the extent to which differences across modes indicate differences in accuracy of reports as opposed to mode artefacts. However, given the few statistically significant differences, it appears that the incidence of mode artefacts is low.

Conclusion
Using computer-assisted self-interviewing for surveillance and intervention studies may result in lower response rates than telephone interviewing or self-administered mailed questionnaires. However, there does not appear to be consistent differences by mode of data collection for responses to indicators of mammography and Pap test screening among middle-aged and older women who complete the assessment. Therefore, other measures such as efficiency and cost-effectiveness of the mode should also be considered when determining the most appropriate form of data collection for use in monitoring indicators of cancer detection and control.