Preparing linked population data for research: cohort study of prisoner perinatal health outcomes

Background A study of pregnancy outcomes related to pregnancy in prison in New South Wales, Australia, designed a two stage linkage to add maternal history of incarceration and serious mental health morbidity, neonatal hospital admission and infant congenital anomaly diagnosis to birth data. Linkage was performed by a dedicated state-wide data linkage authority. This paper describes use of the linked data to determine pregnancy prison exposure pregnancy for a representative population of mothers. Methods Researchers assessed the quality of linked records; resolved multiple-matched identities; transformed event-based incarceration records into person-based prisoner records and birth records into maternity records. Inconsistent or incomplete records were censored. Interrogation of the temporal relationships of all incarceration periods from the prisoner record with pregnancies from birth records identified prisoner maternities. Interrogation of maternities for each mother distinguished prisoner mothers who were incarcerated during pregnancy, from prisoner control mothers with pregnancies wholly in the community and a subset of prisoner mothers with maternities both types of maternity. Standard descriptive statistics are used to provide population prevalence of exposures and compare data quality across study populations stratified by mental health morbidity. Results Women incarcerated between 1998 and 2006 accounted for less than 1 % of the 404,000 women who gave birth in NSW between 2000 and 2006, while women with serious mental health morbidity accounted for 7 % overall and 68 % of prisoners. Rates of false positive linkage were within the predicted limits set by the linkage authority for non-prisoners, but were tenfold higher among prisoners (RR 9.9; 95%CI 8.2, 11.9) and twice as high for women with serious mental health morbidity (RR 2.2; 95%CI 1.9, 2.6). This case series of 597 maternities for 558 prisoners pregnant while in prison (of whom 128 gave birth in prison); and 2,031 contemporaneous prisoner control mothers is one of the largest available. Conclusions Record linkage, properly applied, offers the opportunity to extend knowledge about vulnerable populations not amenable to standard ascertainment. Dedicated linkage authorities now provide linked data for research. The data are not research ready. Perinatal exposures are time-critical and require expert processing to prepare the data for research. Electronic supplementary material The online version of this article (doi:10.1186/s12874-016-0174-7) contains supplementary material, which is available to authorized users.

The NSW Health pharmaceutical Drugs of Addiction Database (PHDAS) holds personal information and details of authorities to administer methadone or Buprenorphine to nominated individuals.
The Department of Corrective Services Offender Information Management System (OIMS) holds personal, current and previous offence and incarceration data for all persons of 18 years or older who have spent time in prison.
The Birth Defect Register (BDR) provides information about pregnancies affected by a birth defect. The NSW BDR collects clinical and demographic data on pregnancies affected by a defect either structural or chromosomal, birth defects in babies at birth and up to one year of age.

Master Linkage Key
Identifying information such as name, address, date of birth and gender obtained from APDC and MDC Mother records are included in the Master Linkage Key (MLK), which is being constructed by the Centre for Health Record Linkage (CHeReL). 1 No health data are used in this process.
The APDC and MDC Mother records were linked using probabilistic record linkage methods and ChoiceMaker software. 2 ChoiceMaker uses 'blocking' and 'scoring' to identify definite and possible matches. During blocking, ChoiceMaker searches the target datasets for records which are possible matches to each other. There are two types of blocking. The exact blocking algorithm requires records to have the same set of valid fields and the same values for these fields. The automated blocking algorithm builds a set of conditions that are used to find as many as possible records that potentially match each other. Scoring employs a combination of a probabilistic decision, which is computed using a machine learning technique, and absolute rules, which include upper and lower probability cut-offs, to determine the final decision as to whether each potential match denotes or possibly denotes the same person. Upper and lower probability cut-offs initially start at 0.75 and 0.25 for a linkage and are adjusted for each individual linkage to ensure false links are kept to a minimum.
At the completion of the process, each record in the MLK is assigned a record identification number and a MLK person ID to allow linked records for the same individual to be identified and extracted.

Summary of results:
Mother's Case probabilistic record Linkage produced 28,973 mothers having 42,724 babies. These 28,973 mothers linked to 60,464 APDC records, 26,261 OIMS records and 3,992 PHDAS records. Then a sample 10% of the remaining Mothers was taken by getting a list of all mothers in the period 1/7/2000 to 31/12/2006 and taking away mothers who had been included in the Case linkage. There were 375,040 mothers remaining after taking away mothers included in case linkage, and from these a random sample of 37,504 mothers was generated. Then all MDC mother records were extracted for those sample mothers (52,272 records). These Mother MDC records were sent back to NSW Health to provide a list of 99,772 MDC baby records that matched to these mothers. The CHeReL then produced a probabilistic record Linkage of MDC Baby records and APDC records (that matched to MDC baby records) to BDR records. The result of this linkage was 34,043 MDC baby records matched to 37,899 APDC records and 1,528 BDR records. 65,729 MDC baby records were unmatched.

1) Extract of MDC Mothers and APDC records from CHeReL Master Linkage Key (MLK).
A set of APDC record IDs was supplied by the NSW Department of Health for Females who were 18 to 44 and admitted with an ICD10AM diagnosis of drug or alcohol use or Psychiatric disorders in the period 1/7/2000 to 31/12/2006. These APDC records were used to extract (using the CHeReL MLK) MDC mother records that had one or more Births in the period 1/7/2000 to 31/12/2006. . (Refer Table 1) Results: 230,139 APDC records were provided by NSW Health and 42,724 MDC Mother records were extracted from the CHeReL Master Linkage Key which linked to 60,464 of the subset ADPC records provided. This equated to 27,511 persons. (Refer Table 2)

2) Adhoc Linkage of (MDC Mothers + APDC records) to (OIMS + PHDAS) records.
The following NSW Government Departments provided the CHeReL with Encrypted Record Identifiers and Demographic data: i) The NSW Department of Corrective Services extracted OIMS records for females aged 18-44 that spent time in prison between 1/1/1998 and 31/12/2006. (Refer Table 1) ii) The Pharmaceutical Services Branch of NSW Health extracted records for all females aged 18-44 at admission of treatment between 1/1/1998 and 31/12/2007. . (Refer Table 1) These Two datasets were linked using probabilistic record linkage to the MDC + APDC records.

3) Selection of 10% of unmatched MDC Mothers.
A list of all persons that contain MDC mother records for the period 1/7/2000 to 31/12/2006 was created and persons that were matched to a APDC, OIMS, or PHDAS records were removed. Then a random 10% sample was taken of the remaining CHeReL MLK Persons and the MDC Mother records were extracted for that person.
Results: There were 404,013 persons that contained a MDC Mother record. (using same periods and ages as matched MDC Mother data). The 28,973 persons that matched to APDC or OIMS or PHDAS were deleted, leaving 375,040 non-matching Persons. A random 10% sample was taken (37,504 persons). All MDC mother records for the sample persons in the period 1/7/2000 to 31/12/2006 (52,272 MLK records) were then extracted from the CHeReL MLK.)

1) Extract of MDC Babies and APDC records from CHeReL Master Linkage Key (MLK).
A set of 99,772 MDC baby record IDs was supplied by the NSW Department of Health for all mothers selected for this study. These MDC baby records were used to extract (using the CheReL MLK) APDC records that were included in the neonatal subset also supplied by NSW Health. (Refer Table 3 Table 4)

2) Adhoc Linkage of (MDC Babies + APDC records) to BDR records
The BDR dataset provided by NSW Health was linked using probabilistic record linkage to the MDC baby records + APDC records.
Results: 37,899 APDC records and 99,772 MDC baby records were linked to 9,945 BDR records to give 1,528 BDR records that linked to a MDC baby record. (Refer table 4

Error Rates:
The CHeReL Master Linkage Key is regularly checked for false positive linkages.
A random sample of 1000 Person IDs was selected from the Master Linkage Key (2009_07a) used to select records to be provided to the study investigators and reviewed for false positive linkages: False positive rate = 3/1,000 records (0.3%) False negative rate <5/1,000 records (< 0.5%) Episodes of care selected for the following parameters: All females aged 18-44 and admitted with following diagnosis codes: Psychiatric illness F00-F09, F20-