To our knowledge, this is the first study that has assessed the linkage of mother and infant birth and hospital records rather than mothers and infants separately. As maternal and pregnancy factors are important predictors of infant outcomes, assessment of the complete linkage is important. In this study the level of complete linkage (95.9%) was high for all births and highest for live singleton births (96.5%). Partially linked mother records (no infant hospital record) had slightly higher rates of adverse events and common risk factors while the partially linked infant records (no mother hospital record) were very similar to those with complete linkage.
This study has shown that stratifying linkage by plurality to overcome the recognized difficulty of linking multiple births
[31, 32] has generated comparable linkage rates for singleton and multiple live births. Stillbirths represent a very different group in terms of linkage. As infant hospital admission records are not generated, stillbirths should not be present in the complete linkage group. While this explains the majority of stillbirth records being in the ‘mothers only’ group, the proportion of unlinked birth records for stillbirths was also much greater than that for live births (4% vs. 0.4%), reflecting that stillbirths remain a problem for linkage. The lower rate of linkage for stillbirths and the issue of lower rates of complete linkage for live born singletons ≤24 weeks gestation are probably related. Infants born close to the border of viability (misclassification of stillbirths and live births, and births and miscarriages) have been previously identified as a problematic domain for perinatal record linkage
. For these reasons, unless infants ≤24 weeks are of particular interest, studies using probabilistically linked records may benefit from restriction to the population of at least 24 weeks gestation. For stillbirth studies, specialist linkages may be needed to improve linkage rates to the levels needed for robust research.
Among singleton live births, the proportions of birth records with partial (1.4-1.6%) or no linkage (0.4%) to hospital records was small. However, there was some evidence of systematic differences for the partially linked records that had no infant hospitalization record (‘mothers only’). This group has slightly higher rates of adverse infant outcomes and associated risk factors, consistent with observations in other studies
[10, 39–41]. Reduced matching of infant records may be related to the association between missing information, social disadvantage and adverse outcomes, or that severely ill infants with prolonged hospitalization may not necessarily be coded as a birth admission. Restriction to later gestational ages would further reduce the already small size of this group of records. It is important to quantify the number and characteristics of unlinked or partially linked records to assess the potential for bias in estimation of the burden of disease and association between risk factors and outcomes. In our study inclusion of additional records would not change, for example, the estimated preterm birth rate nor is it likely to change risk estimates. However, in other settings with higher proportions of unlinked or partially linked records, exclusion of such records could introduce bias.
Our finding that the unlinked birth records represent a relatively low risk group of mothers and babies is likely to be a local phenomenon. The over-representation of births in private hospitals in the unlinked birth records is likely a result of missing name information. It is at the discretion of private hospitals as to whether name information is collected, and so generally have a large amount of missing name information for both mothers and infants, thus affecting linkage rates for both mothers and infants. Changes to the data provided from private hospitals for linkage could potentially reduce the size of the unlinked birth records.
The results highlight the importance of comparing the characteristics of probabilistic record linkage for perinatal research for mothers and infants, given the potential bias introduced into analysis by incomplete record linkage. It is recommended that for the chosen study population, linked and unlinked records should be requested for analysis and a comparison of linked and unlinked records be undertaken as part of any research using probabilistically linked data. This is of even greater importance when newly-established datasets and linkages are used, which is in contrast to the well-established datasets and linkage protocols used by the CHeReL which generated the linked data for this study. Further, in order to properly discuss the potential impacts, it is necessary for researchers to have a reasonable understanding of how the probabilistic linkage process works and the matching processes involved.
The hospital birth admission records for mothers and infants that did not link to a birth record were small in number and of comparable size to the number of unlinked birth records, and inevitably include some missed links. However, particularly for mothers, there is difficulty in establishing birth admission records as more than one hospitalization may be identified as a birth admission. Although used in the past
[42, 43], we found that selecting maternal hospital records on a single outcome of delivery code (ICD10: Z37, ICD9: V27) to be inadequate and a much more comprehensive list was required (Table
2). This agrees with a US study that showed that identifying maternal hospital records using outcome of delivery missed complicated pregnancies
. Furthermore, due to the nature of ICD coding there was difficulty in classifying the plurality and whether the birth(s) were live born or stillborn. In general a good understanding of coding practices can help to improve identification of these records.