Skip to main content

Comparison of reproductive history gathered by interview and by vital records linkage after 40 years of follow-up: Bogalusa Babies



To examine the consistency and likely degree of bias in a study of cardiovascular health, linked with reproductive data over 40 years.


Linkage of vital statistics data of births to female Bogalusa Heart Study participants was compared to interviewing of female participants. The characteristics of participants, the agreement, and demographic, study-related, and medical predictors of discrepancy were analyzed, using kappa statistics, mean and median differences, and logistic regression.


Overall, 3944 (66.7%) of participants were located by one or both sources. The strongest predictor of either linkage or interview was recent and/or frequent participation in the parent study. Agreement between the two sources was generally good (kappa > 0.9 for birthweight and 0.8 for gestational age). Black race, older age, and time since pregnancy were associated with greater discrepancy in reporting of outcomes, but cardiovascular risk factors generally were not.


Combining information from multiple sources to increase sample size and outcome ascertainment may be valid, which will increase population health sciences’ ability to leverage the many existing, large-scale sources to answer previously unexplored questions, even those that the data were not initially collected to answer.

Peer Review reports


With the growing emphasis on use of existing data and cohorts [1, 2], as well as data harmonization to create large analyses across disparate datasets [3,4,5], it becomes more important to understand the degree to which these study designs provide accurate, reliable, and consistent data. While linkage of existing datasets and databases can be powerful and cost-effective, it can also magnify errors [6]. If multiple recordings of data fundamentally derive from the same source, or if linking tends to bias systematically the group of participants that are included in large-scale analyses, such study designs run the risk of leading to a greater degree of confidence in fundamentally flawed or biased analyses.

For example, migration limits the possibility of linkage between datasets. Most data are stored as part of a study database or as clinical or administrative data, and are limited by jurisdiction. Thus, any factor that affects the likelihood of migration affects the probability of linkage across databases. Socioeconomic status (SES) is likely to be particularly important, as it affects mobility, health, and quality of reporting, and can lead to serious bias in the conclusions of studies based on these datasets. Previous studies of mortality linkages have found reduced linkage with Hispanic populations, for instance [7, 8].

In addition to the general issue of the quality and value of data linkages, a question that has recently become more prominent is that of the relationship between reproductive history and health during other parts of the life course. There is a growing recognition that pregnancy does not operate independently of health during other periods of life [9,10,11,12,13]. While it has long been known that parity and age at pregnancy are risk factors for breast cancer [14], more recent research indicates a relationship between pregnancy complications and birth outcomes and later health, particularly cardiovascular disease and diabetes [15,16,17]. Studies of chronic disease, which are usually conducted in middle-aged or older populations, are therefore likely to be interested in finding data on the reproductive years. While several previous studies have looked at the comparison between self-report and other sources of data for studying pregnancy health [18,19,20], in most cases these were pregnancy or child cohorts, so the timing and usually location of the births was known precisely, and most often compared were medical records, rather than vital statistics data. In this analysis, we compare the results of a linkage with vital statistics data with women’s self-report of their pregnancy history in the context of a study designed to assess cardiovascular health, and in which the timing of pregnancies was not known and took place over a period of forty years.


Source cohort

The Bogalusa Heart Study (BHS) was begun in 1973 by Dr. Gerald Berenson [21]. Surveys of the town’s schoolchildren were repeated approximately every two years through 1994, examining newly enrolled children as well as re-examining those previously enrolled, with reexamination of adults begun in 1997 and continuing to the present day. Thus, BHS has examined the longitudinal history of childhood, adolescent, and now adult cardiovascular health. Risk factors measured have varied somewhat over the years, but consistently included anthropometrics, blood pressure, lipids, and glucose, with later extensions to echocardiography and arterial stiffness.

The Bogalusa Babies study was started in 2012. The goal of the study was to examine the relationship between preconception cardiovascular risk factors and reproductive histories within women in BHS. Three sources of information on birth outcomes were considered: vital statistics (birth certificates), interview, and medical records. All 5914 women who had ever participated in the BHS were eligible to participate in the Bogalusa Babies study, regardless of the number of previous study visits or whether the women had been pregnant. Participants were recruited through advertising, mailings, and systematic calls through the study database.

Birth record data linkage

The data linkage has been described in detail previously [22]. Vital statistics birth record data were obtained from the three states thought most likely to include former BHS participants: Louisiana, Texas, and Mississippi. Briefly, Louisiana birth records were available from 1982 to 2009. Linkage of Louisiana birth record data to BHS data was completed using LinkPro v3.0 (InfoSoft, Inc., Winnipeg, MB) [23,24,25]. For 1982–1989 records, linkage variables available were maternal last name, Soundex code for last name, race, and year of birth. From 1990 to 2009, a three-stage linkage process was used, including deterministic record linkage based on maternal social security number (SSN), and probabilistic linkage when SSN was unavailable. Procedures conducted by the Texas and Mississippi vital statistics departments were based on their internal procedures and policies. Texas and Mississippi conducted two-stage linkages for data from 1988 to 2012 using Link Plus 3.0 [26]. Results were then examined for duplicates. If a birth was duplicated or occurred within six months of a previous birth, it was removed from the dataset.


During the interview, women were asked whether they had ever been pregnant, the outcome of each pregnancy, and complications. Women were encouraged to consult a baby book (a scrapbook with memories of the pregnancy and first year), if they had one. They were asked to report the birthweight of each baby and whether the baby was born early, late, or on time, and how early or late, in days or weeks. If a woman said her baby was on time, gestational age was imputed as 39.5 weeks.


The analysis aimed to examine birth outcomes as recorded in the birth certificates and the interviews, both in terms of what predicted the likelihood of inclusion in various sources, and how closely the sources agreed. For this analysis, we focus on number of pregnancies, birthweight (including low birthweight, < 2500 g), and gestational age (including preterm birth, < 37 weeks’ gestation). A future analysis will focus on pregnancy complications such as gestational diabetes and pre-eclampsia, as we have a fourth source of information (the original BHS), and medical records are more crucial for understanding the differences (94% of interview participants provided permission/HIPAA releases for medical records, but in most cases the records were destroyed as over 10 years old.)

First, the births reported in interviews and linked in the datasets were compared. A birth was considered a definite match if it occurred to the same woman on the same date in both sources, then examined the possible sources of discrepancy, including mistakes in dates and births that occurred outside the date and geographic range of the linkage. Both singleton and multiple births were included; to our knowledge, all sets of multiples (1.3%) in the dataset were born on the same day. Probable matches included: births that occurred in the same year with no other date information, births in the same year within one month; births on the same month and day but one year apart, or births less than one year and three days apart. (All of these were considered plausible mis-reporting or mis-recording of the same births.) Both types of matches were included in analysis of agreement.

Next, we examined the characteristics associated with being included in one or both sources. Women were categorized as interview and linkage; interview only, reported at least one birth; interview only, did not report having given birth; linkage only; or neither interviewed nor found in the linkage. Demographic, study-related (number and recency of visits), and cardiovascular risk factors were compared across these categories, using chi-square, ANOVA, and nonparametric tests. When differences were found, regression analysis was used to determine whether those differences were due solely to age and year of participation.

Third, we limited the dataset to those with information from both sources. We examined agreement between sources with respect to birthweight, and gestational age, as well as dichotomized outcomes (very low birthweight, < 1500 g; low birthweight (LBW), < 2500 g; early preterm birth, gestational age < 34 weeks; preterm birth (PTB), gestational age, < 37 weeks). Kappa statistics and mean and median differences were calculated, controlling for clustering within woman (extended kappa statistics [27] and generalized estimating equations).

Finally, we examined predictors of agreement between sources, again looking at demographic, study-related, cardiovascular, and reproductive predictors of agreement and disagreement. Matched pregnancies were examined, with discrepancy defined as not agreeing on whether a pregnancy was LBW or PTB. We also examined these as predictors of size of the discrepancy. Results were again examined controlling for clustering within woman.

The Institutional Review Boards (IRB) of Tulane University (IRB ID#256406), the State Department of Health and Hospitals of Louisiana (Louisiana Department of Health), and the Texas Department of State Health Services approved this protocol (Mississippi deferred to the Tulane IRB). The linkage was conducted under a waiver of consent, as it was deemed minimal risk and infeasible without the waiver.


There were 1026 women with data from both vital records and interview, with a total of 2658 births reported (Fig. 1). Of these, 1624 were exact matches. An additional 113 matched on year only. 32 of these had year only provided from vital statistics due to confidentiality restrictions (Texas). Of the remaining 81, the median difference in time between the birth certificate and interview data was 2.0 days, with a mode of 10 days, a minimum of − 300 days, and a maximum of 228 days (date from birth certificate – date from interview).

Fig. 1

Flowchart, study population, Bogalusa Babies study

Of the remaining 958 births reported in the interview, 65 occurred prior to 1982 and 105 after 2010, and 51 births were reported to occur outside of Louisiana, Mississippi, and Texas, and thus would not have been eligible to be linked in the linkage. 734 births to 465 women (62 women with non-matching information in both sources, 38 only vital statistics data, and 365 only interview data) were not included in both sources, but had no obvious reason for a lack of match in the other. Of these, 16 births were exactly one year or one year and 1–2 days apart.

Overall, 3944 (66.7%) of participants were located by one or both sources. The strongest predictor of either linkage or interview was recent and/or frequent participation in the parent study (Table 1). Those who were interviewed had more study visits (median 5) than those who did not (median 2, p < 0.01), and were more likely to have participated in the study as an adult. The groups that were interviewed were also more likely to have ever smoked, even after the age distribution and years of the interviews were controlled for (aOR for smoking for those with interview and linkage, 1.32, 1.05–1.66; with interview only 1.45, 1.13–1.86). Parental education was more likely to be missing for those who were not located (this data was not collected at early visits); among those with data, those who were located were more likely to have higher parental education. Differences in BMI, cholesterol, and blood pressure were largely explained by the age distribution of participation in the groups, although mean childhood BMI was higher in those who only interviewed (absolute values provided in table; adjusted beta for difference = 0.80, p < 0.01).

Table 1 Comparison of interviews, linkage, and overall dataset

When the matched pregnancies were compared, agreement between the two sources was generally quite good, with kappa statistics > 0.9 for birthweight and 0.8 for gestational age (Table 2). Mean and median differences were close to 0. 128 births (7.5%) were reported as LBW and 1523 (88.8%) as not LBW by both sources; 47 (3.0%) were reported as LBW by the interview and not the birth certificate, while 18 (1.1%) were reported as LBW by the birth certificate but not the interview. 106 births (6.8%) were reported as PTB and 1340 (96.1%) as not PTB by both sources; 54 (3.9%) were reported as PTB by the interview and not the birth certificate, while 49 (3.2%) were reported as PTB by the birth certificate but not the interview.

Table 2 Comparison of reported birth outcomes vs. linked birth outcomes

Few consistent predictors of discrepancy in reporting could be identified (Table 3, Additional file 1: Table S1). Black race was associated with an increased likelihood of discrepancy. First births had a higher likelihood of disparity in LBW and greater discrepancy in gestational age. Those with lower education were more likely to have a discrepancy in reporting LBW (though not birthweight) and in gestational age (though not PTB). Older age was generally associated with greater difference in gestational age, as was time since pregnancy. Cardiovascular risk factors did not show a consistent pattern of being associated with discrepancies in reporting, though occasionally there were statistically significant associations (childhood BMI and blood pressure for birthweight, adolescent cholesterol for PTB).

Table 3 Predictors of discrepancy in reporting birth outcomes, the Bogalusa Babies study


This analysis serves as background in assessing the likely degree of bias for the overall Bogalusa Babies study, which aims to determine the relationship between cardiovascular risk factors and pregnancy outcomes. Overall, there are two questions to be answered: when considering information about reproductive history in a long-term study with no original goal of assessing reproductive outcomes, does linkage to vital statistics or interview find more participants or more representative participants; and when both data sources are available, how do they compare? These questions are relevant not only to our own study, but to other studies who may be interested in studying the relationship of pregnancy outcomes with chronic disease, and those determining the best way to capture such information.

Generally, we found that consistent participation in the study was the best predictor of being located, via linkage, interview, or both. Black women were also more likely to be linked or interviewed, which differs from other analyses of loss to follow-up [28]. Previous studies of linkage to vital statistics indicate lower linkage of those living in deprived areas and rural areas [29], and that therefore, such studies may suffer from a bias in estimating social gradients of health. Studies also indicate increased attrition with lower SES [30]. To some extent, we found a small tendency for lower education to be associated with loss to follow-up, although in this case, those who seek higher education are likely to move from the area (a relatively small town with no university in the parish), at least temporarily. Other studies have also found that more frequent or more intense involvement in the study reduces attrition [31, 32]. Generally, clinical trials and longitudinal studies find those at increased medical risk, advanced-age, and young adult participants are more likely to drop out [30, 32, 33]. Smokers are also more likely to be lost to follow-up [28, 30, 33], which, again, was not the case in our study, although this is probably partly due to the fact that those lost at a young age might not have begun smoking at the time they participated in the study.

The major question of concern is whether use of one or both sources is likely to lead to biased estimation of the relationships. Overall, two-thirds of all participants were located by one or both sources. While 33% loss to follow-up is easily sufficient to bias an analysis, the sample size that remains is adequate for many research questions, so the concern is whether this sample is representative of the larger study. The analysis is generally reassuring on that point, as cardiovascular risk factors usually did not vary between those linked and those not, or those interviewed and those not. There was not a consistent profile indicating that those with worse or better health were systematically excluded, nor of exclusion of those with low or high socioeconomic status.

Agreement between sources for those included was generally quite good, although there was some indication that black race might have been associated with larger discrepancies in reporting, as well as time since the pregnancy. Several reasons for discrepancies can be imagined. They include 1. Poor memory; 2. Misassigning outcomes (i.e., mixing up birthweights of siblings); 3. Misunderstanding or lack of communication around medical issues (e.g., change in due date based on ultrasound not being communicated to or understood by a woman); 4. Approximation, particularly for full-term gestational ages and pregnancies occurring before the routine use of ultrasound; 5. Not regarding gestational age at birth as worth keeping track of, particularly for earlier births that were not ultrasound-dated and went to full term; 6. Data issues: incorrect linkage or data entry, although studies comparing medical records to vital statistics find that vital statistics data are accurate for birthweight and gestational age [34, 35]. Many of these factors are likely to be correlated with education and the effort and respect accorded a woman by medical providers, all of which are more likely to be provided to white women than black women. Black women also tended to have children earlier and thus had a longer time since pregnancy, although this did not fully explain the difference.

Overall, results are generally reassuring as to possible bias; the limited variation by cardiovascular predictors and the good quality of agreement about birth outcomes suggests that loss to follow-up or missed linkage is not likely to produce major bias for studies of those topics.

Our results are generally consistent with previous studies indicating that mothers remember the birthweight and gestational age of their infants quite well, even after many years [36, 37]. A few facts about self-report should be considered. In the U.S., women generally report birthweight in pounds and ounces, while vital statistics data are in grams; however, the conversion did not produce major issues. Perhaps more serious is that women often remember their babies’ gestational age in terms of weeks while medical records and vital statistics report in days; although we allowed for reporting in both weeks and days, most women reported only in weeks. We also began the interview asking whether the baby was early, late, or on time, and women generally reported the baby was on time if s/he was born within the week expected. The more precise recording in medical records and vital statistics is better for studies that treat gestational age as a continuous variable. Finally, many of the earlier births in this study occurred prior to routine ultrasound dating, so women may have had less exact dating available to them.

The question then arises as to whether these results apply to other studies. Some aspects of the study are unusual, though possibly relevant to other studies. Participants did not initially agree to be in a long-term study; particularly, the original waves of data collection were collected as cross-sectional studies rather than a planned longitudinal analysis. Therefore, the loss of participants who participated once, many years ago, as children, is not particularly surprising. This analysis also assesses only women, who are generally more likely to continue participation in studies [28, 38] but also more likely to change their last names. Any analysis addressing pregnancy will have this population. The geographic basis for the study also affects the follow-up; in this semirural area, higher-SES individuals are more likely to leave the area, which affects their loss to follow-up, not necessarily the case for more extensive studies or other types of areas.


Combining information from multiple sources to increase sample size and outcome ascertainment may be valid. We have demonstrated support for use of data harmonization across sources as a feasible and valid way to create analytic epidemiologic cohorts. Studies will generally consider consistently-collected data such as vital records as the preferred source, but can be augmented with maternal self-report for these outcomes. This is good news for population health sciences’ ability to leverage the many existing, large-scale sources of data on health and health determinants for research that expands their scope further by answering previously unexplored questions, even those that the data were not initially collected to answer.

Availability of data and materials

Data are available to qualified researchers upon request and completion of a data use agreement. Due to human subjects protections, data are not publicly available.



Bogalusa Heart Study


Body mass index


Low birthweight


Preterm birth


Socioeconomic status


Social security number


  1. 1.

    Hazra R, Tenney S, Shlionskaya A, Samavedam R, Baxter K, Ilekis J, Weck J, Willinger M, Grave G, Tsilou K, et al. DASH, the data and specimen hub of the National Institute of Child Health and Human Development. Sci Data. 2018;5:180046.

    Article  Google Scholar 

  2. 2.

    Giffen CA, Wagner EL, Adams JT, Hitchcock DM, Welniak LA, Brennan SP, Carroll LE. Providing researchers with online access to NHLBI biospecimen collections: the results of the first six years of the NHLBI BioLINCC program. PLoS One. 2017;12(6):e0178141.

    Article  Google Scholar 

  3. 3.

    Boffetta P, Bobak M, Borsch-Supan A, Brenner H, Eriksson S, Grodstein F, Jansen E, Jenab M, Juerges H, Kampman E, et al. The consortium on health and ageing: network of cohorts in Europe and the United States (CHANCES) project--design, population and data harmonization of a large-scale, international study. Eur J Epidemiol. 2014;29(12):929–36.

    Article  Google Scholar 

  4. 4.

    Doiron D, Burton P, Marcon Y, Gaye A, Wolffenbuttel BHR, Perola M, Stolk RP, Foco L, Minelli C, Waldenberger M, et al. Data harmonization and federated analysis of population-based studies: the BioSHaRE project. Emerg Themes Epidemiol. 2013;10(1):12.

    Article  Google Scholar 

  5. 5.

    Fortier I, Raina P, Van den Heuvel ER, Griffith LE, Craig C, Saliba M, Doiron D, Stolk RP, Knoppers BM, Ferretti V, et al. Maelstrom research guidelines for rigorous retrospective data harmonization. Int J Epidemiol. 2017;46(1):103–5.

    PubMed  Google Scholar 

  6. 6.

    Grimes DA. Epidemiologic research using administrative databases: garbage in, garbage out. Obstet Gynecol. 2010;116(5):1018–9.

    Article  Google Scholar 

  7. 7.

    Miller EA, McCarty FA, Parker JD. Racial and ethnic differences in a linkage with the National Death Index. Ethn Dis. 2017;27(2):77–84.

    Article  Google Scholar 

  8. 8.

    Lariscy JT. Differential record linkage by Hispanic ethnicity and age in linked mortality studies: implications for the epidemiologic paradox. J Aging Health. 2011;23(8):1263–84.

    Article  Google Scholar 

  9. 9.

    Catov JM. Pregnancy as a window to cardiovascular disease risk: how will we know? Journal of women's health (2002). 2015;24(9):691–2.

    Article  Google Scholar 

  10. 10.

    Blackmore HL, Ozanne SE. Programming of cardiovascular disease across the life-course. J Mol Cell Cardiol. 2015;83:122–30.

    CAS  Article  Google Scholar 

  11. 11.

    Hanson MA, Gluckman PD. Developmental origins of health and disease--global public health implications. Best Pract Res Clin Obstet Gynaecol. 2015;29(1):24–31.

    CAS  Article  Google Scholar 

  12. 12.

    Lu MC, Halfon N. Racial and ethnic disparities in birth outcomes: a life-course perspective. Matern Child Health J. 2003;7(1):13–30.

    Article  Google Scholar 

  13. 13.

    Thomas SD, Hudgins JL, Sutherland DE, Ange BL, Mobley SC. Perinatal program evaluations: methods, impacts, and future goals. Matern Child Health J. 2015;19(7):1440–6.

    Article  Google Scholar 

  14. 14.

    Lambertini M, Santoro L, Del Mastro L, Nguyen B, Livraghi L, Ugolini D, Peccatori FA, Azim HA Jr. Reproductive behaviors and risk of developing breast cancer according to tumor subtype: a systematic review and meta-analysis of epidemiological studies. Cancer Treat Rev. 2016;49:65–76.

    Article  Google Scholar 

  15. 15.

    Rich-Edwards JW, Fraser A, Lawlor DA, Catov JM. Pregnancy characteristics and women's future cardiovascular health: an underused opportunity to improve women's health? Epidemiol Rev. 2014;36:57–70.

    Article  Google Scholar 

  16. 16.

    Shah BR, Retnakaran R, Booth GL. Increased risk of cardiovascular disease in young women following gestational diabetes mellitus. Diabetes Care. 2008;31(8):1668–9.

    Article  Google Scholar 

  17. 17.

    Kim C, Newton KM, Knopp RH. Gestational diabetes and the incidence of type 2 diabetes: a systematic review. Diabetes Care. 2002;25(10):1862–8.

    Article  Google Scholar 

  18. 18.

    Vinikoor LC, Messer LC, Laraia BA, Kaufman JS. Reliability of variables on the North Carolina birth certificate: a comparison with directly queried values from a cohort study. Paediatr Perinat Epidemiol. 2010;24(1):102–12.

    Article  Google Scholar 

  19. 19.

    Ellison GT, de Wet T, Matshidze KP, Cooper P. The reliability and validity of self-reported reproductive history and obstetric morbidity amongst birth to ten mothers in Soweto. Curationis. 2000;23(4):76–80.

    CAS  Article  Google Scholar 

  20. 20.

    Bat-Erdene U, Metcalfe A, McDonald SW, Tough SC. Validation of Canadian mothers' recall of events in labour and delivery with electronic health records. BMC pregnancy and childbirth. 2013;13(Suppl 1):S3.

    Article  Google Scholar 

  21. 21.

    Berenson GS. Bogalusa heart study: a long-term community study of a rural biracial (black/white) population. Am J Med Sci. 2001;322(5):293–300.

    CAS  Article  Google Scholar 

  22. 22.

    Harville EW, Jacobs M, Shu T, Breckner D, Wallace M. Feasibility of linking long-term cardiovascular cohort data to offspring birth records: the Bogalusa heart study. Matern Child Health J. 2018.

  23. 23.

    Tromp M, Ravelli AC, Bonsel GJ, Hasman A, Reitsma JB. Results from simulated data sets: probabilistic record linkage outperforms deterministic record linkage. J Clin Epidemiol. 2011;64(5):565–72.

    Article  Google Scholar 

  24. 24.

    Nitsch D, Morton S, DeStavola BL, Clark H, Leon DA. How good is probabilistic record linkage to reconstruct reproductive histories? Results from the Aberdeen children of the 1950s study. BMC Med Res Methodol. 2006;6:15.

    Article  Google Scholar 

  25. 25.

    Jaro MA. Probabilistic linkage of large public health data files. Stat Med. 1995;14(5–7):491–8.

    CAS  Article  Google Scholar 

  26. 26.

    Registry Plus Link Plus [].

  27. 27.

    Yang Z, Zhou M. Kappa statistic for clustered matched-pair data. Stat Med. 2014;33(15):2612–33.

    Article  Google Scholar 

  28. 28.

    Psaty BM, Cheadle A, Koepsell TD, Diehr P, Wickizer T, Curry S, VonKorff M, Perrin EB, Pearson DC, Wagner EH. Race- and ethnicity-specific characteristics of participants lost to follow-up in a telephone cohort. Am J Epidemiol. 1994;140(2):161–71.

    CAS  Article  Google Scholar 

  29. 29.

    O'Reilly D, Rosato M, Connolly S. Unlinked vital events in census-based longitudinal studies can bias subsequent analysis. J Clin Epidemiol. 2008;61(4):380–5.

    Article  Google Scholar 

  30. 30.

    Launes J, Hokkanen L, Laasonen M, Tuulio-Henriksson A, Virta M, Lipsanen J, Tienari PJ, Michelsson K. Attrition in a 30-year follow-up of a perinatal birth risk cohort: factors change with age. PeerJ. 2014;2:e480.

    Article  Google Scholar 

  31. 31.

    Peterson JC, Pirraglia PA, Wells MT, Charlson ME. Attrition in longitudinal randomized controlled trials: home visits make a difference. BMC Med Res Methodol. 2012;12:178.

    Article  Google Scholar 

  32. 32.

    Zunzunegui MV, Beland F, Gutierrez-Cuadra P. Loss to follow-up in a longitudinal study on aging in Spain. J Clin Epidemiol. 2001;54(5):501–10.

    CAS  Article  Google Scholar 

  33. 33.

    Young AF, Powers JR, Bell SL. Attrition in longitudinal studies: who do you lose? Aust N Z J Public Health. 2006;30(4):353–61.

    Article  Google Scholar 

  34. 34.

    Reichman NE, Hade EM. Validation of birth certificate data. A study of women in New Jersey's HealthStart program. Ann Epidemiol. 2001;11(3):186–93.

    CAS  Article  Google Scholar 

  35. 35.

    Martin JA, Wilson EC, Osterman MJ, Saadi EW, Sutton SR, Hamilton BE. Assessing the quality of medical and health data from the 2003 birth certificate revision: results from two states. Natl Vital Stat Rep. 2013;62(2):1–19.

    PubMed  Google Scholar 

  36. 36.

    Rice F, Lewis A, Harold G, van den Bree M, Boivin J, Hay DF, Owen MJ, Thapar A. Agreement between maternal report and antenatal records for a range of pre and peri-natal factors: the influence of maternal and child characteristics. Early Hum Dev. 2007;83(8):497–504.

    Article  Google Scholar 

  37. 37.

    Troude P, L'Helias LF, Raison-Boulley AM, Castel C, Pichon C, Bouyer J, de La Rochebrochard E. Perinatal factors reported by mothers: do they agree with medical records? Eur J Epidemiol. 2008;23(8):557–64.

    Article  Google Scholar 

  38. 38.

    Edwards P, Fernandes J, Roberts I, Kuppermann N. Young men were at risk of becoming lost to follow-up in a cohort of head-injured adults. J Clin Epidemiol. 2007;60(4):417–24.

    Article  Google Scholar 

Download references


Richard Johnson and Judy Moulder at the Mississippi State Department of Health.

Chris Simmons and Jamie Huang at the Texas Department of State Health Services.


The Bogalusa Heart Study is supported by National Institutes of Health grants R01HD069587, AG16592, HL121230, HD032194, and P50HL015103.

Supported in part by U54 GM104940 from the National Institute of General Medical Sciences of the National Institutes of Health, which funds the Louisiana Clinical and Translational Science Center. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Additional support from the Bernick Faculty Development grants and Tulane University Bridge Funding.

The funders had no role in study design; in the collection, analysis and interpretation of data; in the writing of the report; or in the decision to submit the article for publication.

Author information




EWH conceived and wrote the paper, supervised data linkage and analysis, and performed statistical analysis. MJ performed the data linkage and assisted in paper conceptualization. TS constructed relevant datasets and participated in data analysis. DB facilitated data collection, medical record review, and interviewing. MEW performed the data linkage and assisted in paper conceptualization. All authors revised the paper critically for content and contributed to study and analysis design. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Emily W. Harville.

Ethics declarations

Ethics approval and consent to participate

The Institutional Review Boards (IRB) of Tulane University (IRB ID#256406), the State Department of Health and Hospitals of Louisiana (Louisiana Department of Health), and the Texas Department of State Health Services approved this protocol (Mississippi deferred to the Tulane IRB). The linkage was conducted under a waiver of consent, as it was deemed minimal risk and infeasible without the waiver.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional file

Additional file 1:

Table S1. Predictors of discrepancy in reporting birth outcomes, the Bogalusa Babies study, multivariable analysis. (DOCX 20 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Harville, E.W., Jacobs, M., Shu, T. et al. Comparison of reproductive history gathered by interview and by vital records linkage after 40 years of follow-up: Bogalusa Babies. BMC Med Res Methodol 19, 114 (2019).

Download citation


  • Common data elements
  • Vital statistics
  • Reproductive history
  • Cardiovascular disease
  • Bias