This study presents a method for patient-specific record linkage between separate administrative databases to match ED visits and hospital admissions for a cohort of PWID with frequent hospital contacts. Thirty-four percent of ED records were linked to hospital admissions. Using an array of linkage criteria increased the yield of matched records, but broadening the time-threshold between ED visit and hospital admission increased manual inspection requirements. The majority of hospital stays only generated one VAED record. ED ‘departure status’ coding correctly identified 68% of cases with subsequent hospital admissions.
The proportion of ED records linked to hospital admissions in this study (34%) is comparable to the 36% reported by Wong et al.,  using similar methods with numerous linkage criteria. It is slightly higher than the 30% reported by Crilley et al.,  and 25% by Ferris et al.,  potentially reflecting their narrower linkage criteria requiring identical timestamps. A key methodological consideration is to ensure ED arrival/discharge and hospital admission/separation dates and times are requested. A special request may be required as many standard data releases only provide month and year of presentation, which is of insufficient precision to delineate true links in a cohort of frequent presenters who can have multiple hospital contacts in 1 day.
Linkage results must also be interpreted in context of the population or disease in question. The high rate of unlinked VEMD data (n = 2269, 66%) in our cohort of PWID likely reflects high-frequency ED usage patterns, use of ED services for presentations not requiring admission  and higher rates of leaving before treatment completion. Within our cohort, approximately one third (37%) of hospital admissions were unlinked, with no preceding ED visit. These may represent direct ward admissions, transfers of care, missing data from non-VEMD reporting EDs or failure of VEMD record extraction during the first stage of data linkage. In contrast, Boyle’s  study reported only 3.7% unlinked VAED data for victims of major road traumas, whose care pathways more predictably require ambulance retrieval, transportation to ED and direct hospital admission on a single day. Predictable care pathways and single-day events may make record linkage more straightforward, typically generating 1:1 ratios of ED records and admitted episodes for the given event. Requesting timestamps (hh:mm), in addition to datestamps, may be less essential in these cohorts compared to PWID who may have more erratic hospital contact with multiple same-day presentations. Boyle discussed that unlinked hospital admissions in his cohort were likely due to patients being managed in non-VEMD reporting EDs. In our cohort however, other potential sources of bias resulting from lower socioeconomic status, unreliable provision of personal identifiers or less robust data collection at point of care, may introduce systematic linkage errors for PWID that require further exploration [19, 31].
Researchers must also familiarise themselves with relevant administrative coding practices during their study period that may impact linkage rules. During 2008–2013, VAED admission times were recorded when the decision to admit was made and could include treatment time within the ED . As seen in this study, the majority of links had hospital admission times occurring at some point during the ED stay. This was revised in 2016 , and care provided within ED is no longer considered part of admitted care, and episodes of care delivered entirely within EDs are not reported to VAED. The epidemiological impact of this administrative change warrants further study, as rates of hospital admissions and lengths of stay may be artificially altered in time series research.
Selecting time-thresholds to define a ‘linked record’ requires discretion, with trade-offs between sensitivity and specificity. Increasing lag times between ED episodes and hospital episodes will increase the proportion of links identified but may alter the nature of clinical pathways captured (e.g. planned discharges home and subsequent planned admissions, or new and unrelated ED presentations). Clinical interpretation from the researcher is required and time-windows must be selected based on the research question. For our cohort of PWID, when the time between ED departure to hospital admission increased beyond a 2-h window, a corresponding increase in manual interrogation and clinical discretion was required to determine if the hospital admission stemmed directly from an ED visit: however, this level of interrogation may not be feasible with larger datasets. Of note, including links where hospital admission times occur prior to ED arrival is uncommon in the literature, however there was a notable proportion in this study (n = 112, 9%). The large majority (90%) had no more than an 11-min discrepancy in recorded arrival/admission times, likely representing administrative error rather than false matches. Linkage time rules should therefore be based on the study purpose, coding practices and capacity for data interrogation; narrower windows may fail to capture some direct admissions or planned transfers and broader windows may capture some unplanned re-presentations, planned re-admissions or failed discharges.
The absolute incidence of VAED records representing continuous episodes of care was low (6%, n = 119). Previous research in this cohort identified that the majority of hospital admissions were due to mental health, drug use, injury or skin infections ; conditions which may not require multiple hospital-based episodes of care. Researchers must decide on the value of increasing the complexity of their linkage algorithm to identify these sequential admissions, as it may be more pertinent for certain disease states than others. For example, hip fractures almost universally require at least two episodes of care, from acute orthopaedics to subacute rehabilitation and failure to capture all VAED episodes within one total hospital stay may overestimate incidence of disease and underestimate hospital costs [10, 11].
From an application perspective, this study demonstrates that linking administrative datasets provides more comprehensive and reliable information on patient pathways than using databases in isolation. There are known limitations within ED administrative data  and researchers using ED departure status alone to infer discharge pathways risk under-ascertainment of hospital admissions and cannot describe which hospitals, treating teams or services were used during the admitted component of the patient journey. Similarly, researchers are limited in making inferences about pre-hospital resource utilisation or specific patient pathways using the VAED admission-type variable alone. Aside from the option of ‘emergency admission through emergency department at this hospital’, the remaining options were non-specific (see Table 1) reflecting only the broad nature of hospital presentations; emergency versus planned.
This study is subject to the known limitations in data accuracy and completeness within administrative databases. Although data interrogation and re-coding was feasible on this moderate size dataset (< 5000 records), data re-coding was minimised to present a method reproducible for larger datasets. Whilst this study used Australian databases, we believe the insights and approaches offered are relevant for any researcher interested in patient-specific record linkage between administrative databases or mapping patient pathways. The multi-staged linkage process created opportunities for error and, in the absence of a gold-standard dataset, assessing linkage quality remains a challenge . CVDL reviewed their linkage algorithm to minimise false negatives and linked data was interrogated to remove duplicates. The second stage of linkage was based on the assumption that hospital admissions occurring within 24-h of ED presentations are clinically related. Clinically linked episodes occurring beyond this timeframe will have been missed (false negatives) and clinically unrelated episodes within this timeframe may have been linked (false positives). Exploring numerous time-thresholds, identifying episodes of continuous care, thorough manual inspection and cross-checking ‘expected’ versus ‘found’ links minimised these errors. Finally, ED and hospital admission represent only a component of the patient journey. In the absence of common identifiers, system wide data linkage including ambulance, outpatient, ambulatory and general practice databases will be fraught with methodological challenges. Further methodological studies, such as this, will improve our understanding of the strengths and limitations of linkage studies and assist in our analysis and interpretation of linked data.