Skip to main content

Table 2 Aspects to consider when choosing to obtain resource-use data using self-reported or electronic methods

From: Self-reported and routinely collected electronic healthcare resource-use data for trial-based economic evaluations: the current state of play in England and considerations for the future

Aspects to consider

Self-reported

Electronic database

Access to person-level or record-level data

Data reported by the patient themselves (or a proxy on their behalf) are patient-level by definition.

Currently a major issue for electronic datasets. To those without advanced knowledge of large datasets, it is unclear whether person-level data can be obtained and the IG aspects for obtaining these data are challenging for researchers. There may also be a restricted data flow of person-level data depending on the current stance of the data holders of what constitutes appropriate data protection policies (e.g. NHS Digital)

Service for which data are required

Essential for services with no electronic records; for example, travel, childcare, over-the-counter medications

All care services should operate an electronic administrative system from which data could be obtained – will only collect data based on care service provided or if linked to another service (e.g. CPRD linked to HES; SystmOne central database).

Practicality and cost

Pragmatic and cheap method which is well understood and largely under the control of the researcher

Large datasets often incur a cost and the researcher is bound by the time for data approval and extraction by the data holders. Raw data extraction can be time consuming and relatively costly compared with self-reported methods.

Number of patients

Administratively burdensome for large numbers of patients

If a large dataset exists and contains some person-level identifier code (e.g. NHS number), then obtaining data for large patient numbers is possible. For raw data extraction, less practical for large numbers of patients unless a systematic method for data extraction is available (e.g. software system for data extraction).

Validity of data

Known issues with validity of self-reported data, particularly problematic if differential between arms. Can be tested in a pilot phase.

Large databases have been known to validate their data; however, the extent to which these data are validated is not transparent, and validity for costing purposes may not have been tested. Raw data are complicated to validate.

Time horizon for analysis

Loss to follow-up may be higher with a lengthy time horizon. Self-reported methods may work better for shorter time horizons (i.e. one questionnaire per 3 month time period of interest).

Depends on time horizon of the database. Loss to follow-up can occur in large datasets and raw data depending on the database or service (e.g. GP practice may change system restricting eligibility to provide data to particular primary care datasets).

Patient group being analysed

Care may be needed with particular patient groups who lack capacity, for example

Different patient groups may use different services from which data may need to be obtained. Type of patient (e.g. cognitive ability) is not generally a concern.

Type of costing exercise

(e.g. top-down or micro-costing)

Can be tailored exactly to the type of costing exercise required but depends on knowledge of patient to provide the detail of care consumed. More time consuming collecting detailed information for micro-costing exercises.

Raw and large datasets can offer aggregated or very detailed information based on the level of data recording. Some data offered may still not be reliable for micro-costing (e.g. time with patient recorded in large databases such as CPRD).

Recall bias

Problematic if differential recall errors exist systematically between arms of a trial

Recall bias is not an issue, but potential bias relies on accurate data recording at the service-level.

Missing data

A known problem with self-report; can be minimised by following good practice

Missing data is not a ‘known’ issue – if data are missing, then not easy to assess (i.e. it would be assumed there was no resource-use). Some evidence of data missing from HES, but would be difficult to assess extent in a trial.

Regional or national study

Data can be collected consistently across geographical areas

More detailed datasets are available regionally than nationally. National datasets depend on service uptake to provide electronic data. Raw data may be difficult to obtain electronically if there is no remote access to the software system (e.g. remote access is possible with SystmOne).

International studies (outside of England)

Self-reported data is still necessary for many countries and necessary in circumstances where electronic systems are not available or cannot provide the data required.

More countries are using electronic data provided by care services, commissioners, and insurance companies (to name a few sources). This is important to note when comparing analysis in England with other international studies. Comparably, this may limit our (i.e. studies based in England) ability to perform the best possible analysis which is desirable as part of research studies.

All-cause or disease specific assessment

Patients may struggle to correctly identify whether an event is related to their condition or not

A variety of codes (e.g. ICD-10 and OPCS-4 for in-hospital codes) and free text to specify whether resource-use is associated with a condition. Primary care data has Read or SNOMED CT codes for specific conditions and diseases, although these codes are not always used appropriately. Free text is difficult to use. HES outpatient diagnosis codes are poorly completed.

Baseline measurements

Additional burden on patient and very rarely collected.

Not an issue if the data are available for the baseline period of interest.

Experience and familiarity

Relatively easy for a researcher to get up to speed with. Design for a clinical study may require knowledge of the clinical area to accurately collect the resource-use cost drivers.

For large datasets, requires a data requisition form to be completed which is not always easily understood. For commissioning data, requires a contact with access to the data and a data requisition form to be completed. For raw data, requires knowledge of the service or to identify a person who can extract the data (i.e. trained researcher of practice nurse).

Information Governance

Managed through standard ethics application methods.

IG is a major concern when using electronic data. This process can be navigated with expert guidance, although the developing world of electronic data will always be a concern for researchers.

Social care data

Social care data could be self-reported and the exact type of social care data of interest could be specified within the questionnaire.

Routinely collected social care data is not discussed in this paper, but is an important aspect for future consideration. Healthcare systems are more usable for obtaining data relative to social care systems because of aspects such as the inclusion of unique identifiers (NHS number of other pseudo codes), relatively more standardised coded data, established national data dictionaries, and national software and system requirement.