Self-selection in a population-based cohort study: impact on health service use and survival for bowel and lung cancer assessed using data linkage

Background In contrast to aetiological associations, there is little empirical evidence for generalising health service use associations from cohort studies. We compared the health service use of cohort study participants diagnosed with bowel or lung cancer to the source population of people diagnosed with these cancers in New South Wales (NSW), Australia to assess the representativeness of health service use of the cohort study participants. Methods Population-based cancer registry data for NSW residents aged ≥45 years at diagnosis of bowel or lung cancer were linked to the 45 and Up Study, a NSW population-based cohort study (N~ 267,000). We measured hospitalisation, emergency department (ED) attendance and all-cause survival, and risk factor associations with these outcomes using administrative data for cohort study participants and the source population. We assessed bias in prevalence and risk factor associations using ratios of relative frequency (RRF) and relative odds ratios (ROR), respectively. Results People from major cities, non-English speaking countries and with comorbidites were under-represented among cohort study participants diagnosed with bowel (n = 1837) or lung (n = 969) cancer by 20–50%. Cohort study participants had similar hospitalisation and ED attendance compared with the source population. One-year survival after major surgical resection was similar, but cohort study participants had up to 25% higher post-diagnosis survival (lung cancer 3-year survival: RRF = 1.24, 95% confidence interval 1.12,1.37). Except for area-based socioeconomic position, risk factors associations with health service use measures and survival appeared relatively unbiased. Conclusions Absolute measures of health service use and risk factor associations in a non-representative sample showed little evidence of bias. Non-comparability of risk factor measures of cohort study participants and non-participants, such as area-based socioeconomic position, may bias estimates of risk factor associations. Primary and outpatient care outcomes may be more vulnerable to bias. Electronic supplementary material The online version of this article (10.1186/s12874-018-0537-3) contains supplementary material, which is available to authorized users.


Background
Cohort studies have been established around the world to examine health and health care use in ageing populations [1][2][3][4][5][6][7][8]. Applying findings from these cohorts to health service policy and practice is vital to realising the public health benefit of these studies. Participants in cohort studies are typically healthier and more socioeconomically advantaged than the general population by self-selection or design (e.g. British Doctors Study). As a result, the prevalence of exposures and absolute risk of disease or death among cohort study participants are often different to their source population. While the basis for generalising aetiological associations from non-representative cohorts is well established [9], there is little empirical evidence for generalising health service use associations. Additionally, absolute measures of health service use are often required to inform health service policy and practice. The few studies examining the effect of self-selection on absolute measures of health service use are conflicting, finding higher [10,11] and lower [12,13] health service use among participants.
The 45 and Up Study is a population-based cohort study in New South Wales (NSW), Australia, that was established to improve knowledge of ageing, with health service use a priority area [14]. Cancer, a major ageing-associated disease, is the largest cause of burden of disease in Australia [15], is among leading causes in other high-income countries and is becoming a significant burden in middle-and low-income countries [16]. Providing effective, efficient and equitable access to cancer care services is important in reducing this burden. However, there is evidence that patterns of health service use, such as late diagnosis and reduced treatment uptake, across population subgroups lead to poorer cancer outcomes [17][18][19]. Demonstrating that patterns of health service use and associations with risk factors among cohort study participants are generalisable to the source population enables research findings from cohort studies to be applied with more confidence to health service policy and practice.
In this study, we aimed to assess differences in inpatient hospital use, emergency department attendance and survival among 45 and Up Study participants diagnosed with lung or bowel cancer compared with people diagnosed with these cancers aged 45 years and older in the NSW population using linked population-based cancer registry and administrative health data. We compared estimates of associations between risk factors (remoteness of residence, socioeconomic position, country of birth, comorbidity) and health service use and survival outcomes to assess selection bias.

Study design
This study used de-identified linked cancer registry, administrative hospital, death registry and 45 and Up Study cohort data. Around 267,000 NSW residents aged 45 years and older joined the Sax Institute's 45 and Up Study between February 2006 and December 2009, representing around 10% of this age group. Participants were randomly selected from the Department of Human Services (formerly Medicare Australia) enrolment database, a national publicly funded universal health care scheme. People aged 80 years and older and those from rural areas were over-sampled by a factor of two and all remote residents were sampled. Participants were recruited by completing a postal questionnaire and consenting to follow-up and linkage of their health-related records. The response rate was reported as 18% mid-recruitment period [14] and additionally participants (< 1%) volunteered via a hotline.
Cancer case data were obtained from the NSW Cancer Registry (NSWCR), a statutory registry of all invasive cancer cases (excluding non-melanoma skin cancer) diagnosed in NSW residents. Admission records for all NSW public and private hospitals were obtained from the NSW Admitted Patient Data Collection. Emergency department (ED) attendances at public hospitals were obtained from the NSW Emergency Department Data Collection, which had substantively complete coverage of EDs in metropolitan areas but was incomplete for regional areas for the study period. Attendance data were not available for the small number (< 5) of EDs at private hospitals which made up < 5% of ED activity during the study period [20,21]. Mortality follow-up was from deaths recorded on the NSW Registry of Birth Deaths and Marriages.
The study was conducted with ethical approval from the NSW Population and Health Services Research Ethics Committee (HREC/14/CIPHS/60). Probabilistic linkage of the datasets was conducted by the Centre for Health Record Linkage (CHeReL) with an estimated false positive rate of 5 per 1000 (www.cherel.org.au). Identifying information (such as names and addresses) was separated from content information in the datasets to protect privacy. The CHeReL uses Choicemaker software to match identifiers and create a de-identified Project Person Number that enables records for an individual to be ascertained across the study datasets by researchers without accessing identifying information. The 45 and Up Study is approved by the University of New South Wales Ethics Committee.

Study population
People aged ≥45 years diagnosed with bowel cancer (International Classification of Diseases, 10th Edition, Australian Modification [ICD-10-AM] C18-C20) or non-small cell lung cancer (ICD-10-AM C34, excluding m8041-m8045 and m8246; hereafter 'lung cancer') between February 2006 (commencement of 45 and Up Study recruitment) and December 2012 (the most recent data available at the time of extraction) were ascertained from the NSWCR. Bowel and lung cancer were selected since they are commonly diagnosed cancers, are leading causes of cancer death and have high rates of health service use in Australia [22]. People with a cancer diagnosed prior to the index cancer (from January 2000 onwards) or with another cancer case diagnosed within three months of the index cancer were excluded. Cases of an uncommon histology type, notified to the NSWCR by death certificate only, or with an unknown diagnosis date or place of residence were excluded. Cancers with uncommon histology types were excluded since they have different treatment patterns and outcomes.

Outcomes and study variables
We examined health service use in the year prior to and the year after diagnosis. We used measures of hospital use (number of overnight admissions and number of weeks in hospital, excluding hospitals that primarily provide sub-and non-acute care) and ED attendance since linked population data are available for these areas of health service use. We measured major resection (defined by the Australian Classification of Health Interventions) since surgery is the main curative treatment for bowel and lung cancer. We measured all-cause one-and three-year post-diagnosis survival and, for those who underwent resection, one-year post-operative survival. Survival outcomes were measured since there are high rates of health service use in the lead up to death [23]. Mortality follow-up was to September 2016.
Age at diagnosis, sex, area-based socioeconomic position (Index of Relative Socioeconomic Disadvantage [24] for Census Districts), remoteness of residence [25] and extent of disease at diagnosis were obtained from the NSWCR. Country of birth was obtained from the NSWCR for people diagnosed between 2006 and 2010 but was unavailable from the NSWCR for 2011-2012. Country of birth was obtained from hospital admission records for this period. Hospital type (public or private), urgency of admission and the Charlson comorbidity score [26] (calculated with a five year look-back from hospital-recorded diagnoses) were obtained from hospital admission records.

Analysis
We compared demographic, cancer case and health service use characteristics by 45 and Up Study participation status using a ratio of relative frequency (RRF). This was calculated by dividing the proportions in the 45 and Up Study by the proportion in the NSW cancer population for each categorical variable [27,28]. A ratio greater than one indicates over-representation and a ratio below one indicates under-representation among 45 and Up Study participants. We restricted the examination of risk factor associations to resection use, one-year post-diagnosis survival, > 4 weeks in hospital and > 2 ED attendances in the year after cancer diagnosis. We focused on examining potential bias in associations with remoteness of residence, socioeconomic position, country of birth and comorbidity since these factors are often the focus of health service use studies. Associations with these factors were examined using a multivariable logistic regression model including all the factors of interest and adjusting for factors with known prognostic importance. In the model of resection status, sex, age and extent of disease at diagnosis were included as prognostic factors. In the models of the other outcomes, resection status was included as an additional prognostic factor. Adjusted relative odds ratios (RORs) were calculated as the ratio of the OR of 45 and Up Study participants to the OR of the NSW cancer population. Confidence limits (CLs) for the RRFs and RORs were calculated using the formula described by Nohr et al. [28]. The formula assumes the subsample is a random sample of the population, which is not the case here; however, the coverage properties were found to be adequate in a similar study [28].

Demographic and cancer characteristics
A total of 233,133 NSW residents aged ≥45 years were diagnosed with 245,266 cancer cases between February 2006 and December 2012 (Fig. 1). In NSW in 2008-12, the incidence per 100,000 age-standardised to the world population was 44.8 and 33.3 among men and 31.5 and 21.6 among women for bowel cancer and lung cancer respectively for all ages [29]. A total of 17,661 participants of the 45 and Up Study were diagnosed with cancer after enrolment, 7.6% of all NSW residents diagnosed. Lung cancer was under-represented among 45 and Up Study participants diagnosed with cancer (7.8% [n = 1379] v 10.1% [n = 23,537]; RRF = 0.77, CL 0.74, 0.81). In the final analysis cohorts, 6.8% (n = 1837) and 5.6% (n = 969) of NSW residents aged ≥45 years at diagnosis of bowel or lung cancer, respectively, were 45 and Up Study participants.
Sex and age distributions of 45 and Up Study participants diagnosed with bowel or lung cancer were similar to the NSW cancer population distributions (Table 1). Although there was under-representation of the youngest age groups, the median and interquartile ranges of age at diagnosis were similar. People from regional and remote areas were over-represented among 45 and Up Study participants by up to 50%. The distribution of socioeconomic position was similar between participants and the source population, particularly for bowel cancer. However, the over-representation of regional and remote areas among 45 and Up Study participants affects the distribution of socioeconomic position since these areas are generally more socioeconomically disadvantaged than major cities. Stratifying by remoteness, the overrepresentation of people from less disadvantaged areas was evident. For example in major cities, people diagnosed with bowel cancer from areas in the least disadvantaged socioeconomic quintile were over-represented in the 45 and Up Study (   In the NSW population, greater socioeconomic disadvantage was associated with lower odds of resection and one-year survival for bowel cancer whereas there was little evidence of an effect among 45 and Up Study participants from the point estimates of the disadvantage quintiles, although confidence intervals were wide. Odds of > 2 ED attendances in the year after bowel cancer diagnosis were in the same direction for 45 and Up Study participants as for the NSW population but were consistently around 1.5 times higher for the disadvantage quintiles. The relative consistency of differences in the magnitude and direction of odds ratio estimates for socioeconomic position across multiple outcomes among 45 and Up Study participants with bowel cancer could be indicative of bias.

Discussion
The expectation of cohort study participants being healthier and wealthier than their source population was met in regard to health but was not as straightforward for wealth. The marginal distribution of socioeconomic position of 45 and Up Study participants diagnosed with bowel or lung cancer was similar to the source population of people aged ≥45 years diagnosed with these cancers. However, the expected over-representation of people from more socioeconomically advantaged areas was evident when stratified by remoteness. We attribute the difference between the stratified and marginal distributions of socioeconomic position to the over-representation of people from regional and remote areas. People from regional and remote areas were over-sampled in the design of 45 and Up Study to facilitate examining effects of rurality [14] and these areas are generally more socioeconomically disadvantaged than major cities [24].
Slightly more 45 and Up Study participants diagnosed with bowel or lung cancer had no comorbidity, with participants having higher post-diagnosis survival compared with the population. Lung cancers were less common among 45 and Up Study participants compared to the NSW population. Since most lung cancers are smoking-related [30], this likely reflects the lower prevalence of smokers and greater proportion of never smokers in the 45 and Up Study compared to NSW population survey-based estimates at baseline (7.4 and 12% smoking prevalence [31], 56% [32] and 40-50% never smokers [33] in the 45 and Up Study and NSW population respectively). A higher proportion of 45 and Up Study participants were diagnosed with localised bowel cancer compared to the NSW population, which may be related to 45 and Up Study participants having higher rates of bowel screening compared to NSW population estimates [31]. A national government-funded screening program was phased in from late 2006 to facilitate early detection of bowel cancer. Additionally screening tests have been available from pharmacies and medical practitioners. In contrast, lung cancer does not have a screening program and the diagnosis of localised lung cancer was similar between 45 and Up Study participants and the NSW population. Despite these differences, absolute measures of hospital and emergency department use in the year prior to and after cancer diagnosis were similar to the population estimates.
Estimates of risk factor associations among 45 and Up Study participants were generally consistent with population estimates, despite participants not being a representative sample in terms of these factors. While this is a demonstration of representativeness not being required for associations to be generalisable, the converse, that representativeness does not guarantee generalisability, was also demonstrated. The only risk factor that showed evidence of systemic bias was socioeconomic position among people with bowel cancer, which had a similar marginal distribution to the population. This apparent bias may in part be due to differing effects of socioeconomic disadvantage on health care utilisation in urban and rural settings.
As in most epidemiological studies, the measure of socioeconomic position used in this study is a general index that may not capture contextual effects of disadvantage in urban and rural settings [24,34]. The apparent bias may also be due to the area-level measure of socioeconomic position used in this study since an individual-level measure was not available in the population cancer data. 45 and Up Study participants may have different individual-level socioeconomic characteristics to those in the same area, making participants not comparable to non-participants. Similarly, since the 45 and Up Study baseline questionnaire was only available in English, country of birth associations were measured among people with sufficient English proficiency to respond which could have contributed to instances of non-English speaking country of birth associations being in the opposite direction to the population estimates. Selection bias can occur when there are joint risk factors for study participation and outcomes and, furthermore, the magnitude of bias depends on the strength of these associations [35]. Health service use studies may be prone to selection bias since factors such as health Fig. 2 Adjusted* odds of resection, > 4 weeks in hospital and > 2 ED visits in the year after diagnosis for people diagnosed with bowel or lung cancer, 45 and Up Study participants and NSW residents aged ≥45 years. *Adjusted for sex, age at diagnosis, extent of disease at diagnosis, and additionally for hospital use and ED attendance outcomes, resection status literacy and health-seeking behaviours are likely to be associated with participation in a cohort study and are associated with health service use [36,37]. Selection bias can be minimised by including factors associated with selection and outcome in adjustment models [35]. However, there are no questions on health literacy and few questions on health-seeking behaviours in the 45 and Up Study. It would be beneficial for cohort studies established with an aim of examining health service use to include validated measures of health literacy and health-seeking behaviours.
In aetiological studies, a key consideration in assessing the generalisability of associations is whether the underlying biological mechanisms are the same in participants and non-participants [38]. In health service use studies, non-biological mechanisms such as attitudes and beliefs towards health service use also need to be considered. In other studies, hospital use by responders to a health survey was similar to non-responders but out-of-hospital health service use differed. [10,11] Hospital use is potentially less likely to be impacted by a person's health-seeking propensity than out-of-hospital care since admitting physicians act as gatekeepers. Much activity for the early detection and diagnosis of cancer occurs in the primary care and outpatient settings. Health service use in response to cancer symptoms depends not only on clinical factors, but also psychosocial factors such as knowledge of symptoms and fear of cancer [39,40]. Population-level primary care and outpatient data are not available for linkage studies in NSW. Health service use in these settings may be more vulnerable to the impacts of self-selection and requires further examination.
With the large number of comparisons in this study, some differences between estimates from 45 and Up Study participants and the NSW population are likely to occur by chance. Additionally, the 45 and Up Study participants were a small sample of the population and differences could result from sampling error rather than non-sampling error such as self-selection. The precision of the study estimates was limited by the small number of 45 and Up Study participants diagnosed with cancer. For individual cancer sites, even large cohort studies may be underpowered for the detection of differences between risk groups for health service use outcomes [41]. Furthermore, small numbers can reduce the number of confounders able to be included in adjustment models due to sparse-data bias [42]. The number of cancer cases diagnosed among 45 and Up Study participants will increase with longer follow-up. However, the findings of health service use studies using cancer cases diagnosed over long time periods may have limited applicability to health service policy and practice which often require timely data.
There are few studies examining the impact of self-selection on health service use outcomes and none focusing on cancer that we are aware of. Of these studies, most have examined participation in surveys with response rates of 50-80% conducted in Scandinavia or the Netherlands with one US Study [10,11,13,43,44]. The effect of self-selection on hospitalisation and psychiatric care has been examined in one cohort study [12] which had participation rates of 65-90% compared with the 45 and Up Study (18%) [14]. These studies have focussed on absolute measures of health service use and have reported both higher [10,11,44] and lower [12,13,43] health service use among participants. One study reported that health service use was only slightly (3-6%) lower among survey participants compared with all non-responders, but for the subset of people who did not respond due to illness there were much greater differences in health service use [43]. Similar to our study, the one study examining associations between demographic factors and health service use (including use of prescription drugs, hospitalisations, specialist, allied and dental care) among responders to a health survey found estimates were similar to those measured from target sample [10]. Our study complements another study on the representativeness of 45 and Up Study cohort which demonstrated the generalisability of aetiological associations measured from 45 and Up Study participants to survey-based NSW population estimates [31].

Conclusions
This study contributes to the empirical evidence base for generalising health service use associations measured from non-representative samples. There was little evidence of bias in risk factor associations for the cancers and outcomes examined. However, the comparability of participants and non-participants with respect to the risk factor measure requires consideration. Further study is warranted on health service use in the primary and outpatient settings since the potential for selection bias is greater.

Additional file
Additional file 1: Socioeconomic position by rurality and univariable and multivariable models of health service use outcomes. The additional file contains ratios of relative frequencies for area-based socioeconomic position stratified by rurality (major city; regional and remote) and univariable and multivariable logistic regression models of health service use outcomes (resection; > 4 weeks in hospital; > 2 emergency department attendances; one-year all-cause post-diagnosis survival) for 45 and Up Study participants and NSW residents aged ≥45 years at diagnosis of bowel or lung cancer. (DOCX 610 kb) Abbreviations CL: Confidence limit; ED: Emergency department; ICD-10-AM: International Classification of Diseases, 10th Edition, Australian Modification; NSW: New South Wales; NSWCR: NSW Cancer Registry; ROR: Relative odds ratio; RRF: Ratio of relative frequency Funding This research did not receive any specific grant.

Availability of data and materials
The data that support the findings of this study are available from the relevant data custodians of the study datasets. The 45 and Up Study data were used under license for the current study. Restrictions by the data custodians mean that the data are not publicly available or able to be provided by the authors. Researchers wanting to access the datasets used in this study should refer to the 45 and Up Study application process (www.saxinstitute.org.au/our-work/ 45-up-study/for-researchers/) and the Centre for Health Record Linkage application process (www.cherel.org.au/apply-for-linked-data).
Authors' contributions JY and MS conceived the experiment. NC designed the experiment with input on the study protocol from all authors. RW provided advice on statistical analysis. NC conducted the analyses and drafted the manuscript. SP conducted secondary data analysis. DB provided input into the interpretation of the data. All authors provided critical revision of the manuscript and approved the final version.

Ethics approval and consent to participate
The study was conducted with ethical approval from the NSW Population and Health Services Research Ethics Committee (HREC/14/CIPHS/60). The 45 and Up Study is approved by the University of New South Wales Ethics Committee. Participants of the 45 and Up Study provided written consent to the linkage of their health-related records.

Consent for publication
Not applicable.
Competing interests NC, SP, MS, RW and JY declare that they have no competing interests. DB was an employee of the Sax Institute NSW and Research Manager of the 45 and Up Study for part of the time this study was being conducted.

Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.