- Research article
- Open Access
- Open Peer Review
This article has Open Peer Review reports available.
A flexible Bayesian hierarchical model of preterm birth risk among US Hispanic subgroups in relation to maternal nativity and education
© Kaufman et al; licensee BioMed Central Ltd. 2011
Received: 5 July 2010
Accepted: 19 April 2011
Published: 19 April 2011
Previous research has documented heterogeneity in the effects of maternal education on adverse birth outcomes by nativity and Hispanic subgroup in the United States. In this article, we considered the risk of preterm birth (PTB) using 9 years of vital statistics birth data from New York City. We employed finer categorizations of exposure than used previously and estimated the risk dose-response across the range of education by nativity and ethnicity.
Using Bayesian random effects logistic regression models with restricted quadratic spline terms for years of completed maternal education, we calculated and plotted the estimated posterior probabilities of PTB (gestational age < 37 weeks) for each year of education by ethnic and nativity subgroups adjusted for only maternal age, as well as with more extensive covariate adjustments. We then estimated the posterior risk difference between native and foreign born mothers by ethnicity over the continuous range of education exposures.
The risk of PTB varied substantially by education, nativity and ethnicity. Native born groups showed higher absolute risk of PTB and declining risk associated with higher levels of education beyond about 10 years, as did foreign-born Puerto Ricans. For most other foreign born groups, however, risk of PTB was flatter across the education range. For Mexicans, Central Americans, Dominicans, South Americans and "Others", the protective effect of foreign birth diminished progressively across the educational range. Only for Puerto Ricans was there no nativity advantage for the foreign born, although small numbers of foreign born Cubans limited precision of estimates for that group.
Using flexible Bayesian regression models with random effects allowed us to estimate absolute risks without strong modeling assumptions. Risk comparisons for any sub-groups at any exposure level were simple to calculate. Shrinkage of posterior estimates through the use of random effects allowed for finer categorization of exposures without restricting joint effects to follow a fixed parametric scale. Although foreign born Hispanic women with the least education appeared to generally have low risk, this seems likely to be a marker for unmeasured environmental and behavioral factors, rather than a causally protective effect of low education itself.
A great deal of research in reproductive and social epidemiology has focused on the "Hispanic Paradox", in which Hispanic women of low socioeconomic status (SES) in the United States (US) have better than expected birth outcomes, compared to other similarly disadvantaged groups, such as African-Americans . Some authors have suggested that acculturation is a key modifier of the apparently protective effect of Hispanic ethnicity . For example, Acevedo-Garcia and colleagues observed that the well known protective effect of higher socioeconomic position of the mother was more modest for foreign born Hispanics than for the native born . But the "Hispanic" label is North American construction that masks considerable variability between groups . Acevedo-Garcia et al therefore provided a systematic investigation of the interaction between nativity, maternal education (as a marker for SES) and Hispanic subgroup on low birth weight (LBW). In 2002 US Natality Detail data, with a sample size of over 630,000 singleton births to US Hispanic women, the authors used logistic regression with interaction terms to document variation in the association between nativity and LBW by Hispanic subgroup, and an interaction between nativity and education for some subgroups. They reported that when stratified by ethnic subgroups and nativity, the "Hispanic Paradox" is apparent only for foreign-born Mexicans and Central/South Americans. For foreign-born Puerto Ricans and Cubans, increasing education was associated with decreased LBW risk .
We extend the previous research in several ways using a dataset from New York City. Although our sample size is smaller than that used by Acevedo-Garcia et al, by using Bayesian random effects regression models we are able to employ finer categorizations of both education and Hispanic subgroups to depict additional heterogeneity in risk of adverse birth outcomes and variations in the degree to which advancing education is associated with reduced risk across Hispanic subgroups, all on the absolute scale. We propose that the use of a single geographic region is advantageous in contrast with national data in order to avoid confounding by many unmeasured regional differences, such as the historically unique context of Cuban-Americans in South Florida or Mexican-Americans in the Southwest.
The absolute scale is a more useful contrast to make in a public health context because it directly represents the number of attributable cases, and therefore the actual public health burden . The use of each ethnic group as its own reference when constructing odds ratio estimates make it is impossible for the reader to know whether one group has higher baseline risk than another and how variation in baseline risks affects the pattern of odds ratios across groups. A Bayesian modeling approach offers a relatively simple way to obtain effects on the absolute scale along with measures of precision, a task that would be very challenging using frequentist approaches given the use of random effects in a non-linear model like logistic regression. This paper therefore serves to demonstrate the advantages of this analytic approach in terms of fitting the models and representing the output using simple graphs that clearly represent the estimated dose-response function in various groups.
Low birthweight, defined as births of less than <2500 g, has been widely used as a convenient measure of an adverse birth outcome. However, LBW encompasses infants with a mix of underlying pathologies: those who are growing normally but are born too early (i.e. preterm), and those that are full term but small from stunted fetal growth (i.e. intra-uterine growth retardation). Therefore, we focus in the current report on preterm birth (PTB) as a more etiologically distinct outcome , especially as recent studies have demonstrated the importance of prematurity on morbidity and mortality throughout the lifecourse [8, 9].
We used publicly available vital statistics birth data for 1995 to 2003 from the NYC Department of Health and Mental Hygiene. To remain consistent with the previous analysis by Acevedo-Garcia et al , and to allow completed years of education to have a meaningful interpretation, we excluded births to women under age 20. Following common practice, we also restricted to singleton births because of the markedly distinct patterns of fetal growth and gestational age in non-singleton pregnancies. We included all women self-identifying as Hispanic or Latino. However, we categorized these more finely than in previous reports, and we included births to women of "other or unknown" Hispanic origin, classifying them as their own subgroup.
Preterm birth was defined as delivery for any reason prior to 37 completed weeks of gestation using the clinical estimate. Maternal education was based on self-reported years of completed schooling. Although previous authors categorized years of education broadly, we included the exact number of years reported, up to a maximum of 17 years, using a flexible regression model. Nativity was dichotomized as foreign-born or native-born, where for Puerto Rican women this corresponded to the distinction between being born on the island of Puerto Rico ("foreign born") and being born in the mainland United States ("native born"). Hispanic subgroups were based on maternal self-reported ancestry and ethnicity and categorized as: Mexican, Puerto Rican, Cuban, Dominican, Central American (Belize, Costa Rica, El Salvador, Guatemala, Honduras, Nicaragua, Panama), South American (Argentina, Bolivia, Brazil, Chile, Columbia, Ecuador, Paraguay, Peru, Uruguay, Venezuela) and Other/Unknown.
Models were adjusted either for age alone, or for covariates selected to replicate the previously published analysis. These included: prenatal care, defined by the Kessner Index as adequate, intermediate or inadequate care , sex of child, maternal age (categorized as 20-24, 25-29, 30-34, 35-39, 40+), previous live birth (categorized as 0, 1-4, 5+) and dichotomous measures of: tobacco use during pregnancy, alcohol use pregnancy and medical risk factors (anemia, pregnancy-associated hypertension, diabetes, uterine bleeding, preeclampsia, eclampsia, placenta previa, and placental abruption).
Each of the αgjk; g = 1...3, j = 1...7, k = 0,1 is shrunk toward the group mean for that particular nativity class . That is, each of the spline coefficients borrows strength from the spline coefficient of other ethnic groups within that nativity class. The amount of information borrowed between ethnic groups (and therefore the amount of shrinkage), is determined by the precision term, τ. A large value of τ3, for example, indicates that the spline coefficients for foreign born mothers will borrow a larger amount of information from one another.
The Bayesian approach requires priors to be specified for all unknown parameters. The specifications for these priors were selected to be relatively uninformative since little data exists to specify informative priors on the spline coefficients in our model. Further, given the large number of observations prior information is unlikely to have any substantial impact on the results. The prior mean parameters, were assumed to follow independent normal distributions with mean 0 and variance 1. The inverses of the variance terms τ0 ...τ3 were assumed to be independently gamma distributed with shape and rate parameters 0.1 and 0.1. Finally, the prior distribution of the remaining coefficients in expression , βm, were assumed independent and identically normally distributed with mean 0 and variance 10.
Models were fit using WinBUGS version 1.4.3 . To facilitate convergence, we centered spline variables in the model (Additional file 1). Markov chain Monte Carlo (MCMC) algorithms were run for 1,000,000 iterations following a 10,000 iteration burn-in. We retained every 10th iteration to reduce autocorrelation between samples as well as for analytic tractability. Convergence was assessed by visual examination of traceplots (Additional file 2: appendix figure A6). Analyses were repeated with the Markov chains started from different locations to help ensure convergence to a stable posterior distribution. Finally, we fit additional models to test the sensitivity of our results to different prior specifications. In particular, we ensured that results were consistent with the specification of more diffuse distributions on prior parameters.
Over the eight year period there were 990,597 singleton births to women ≥20 years of age. We restricted the analysis to women self-identified as Hispanic or Latina (n = 365,139). We excluded from the analysis observations missing birthweight (n = 81), nativity status (n = 5,962), ethnic ancestry (n = 9,416) or education (n = 15,254), for a cumulative exclusion of n = 26,550 (2.7%). We also excluded all observations that were missing any covariate when fitting the fully adjusted model, although no covariate was missing more than 1.2% with the exception of prenatal care (11.4%), which left a final sample size of 258,680 for the fully adjusted analyses.
Descriptive statistics of singleton births among mothers ≥20 years old by Hispanic subgroup and nativity: New York City, 1995-2003
Population (nativity %)
Low birthweight (%)
Term low birthweight4 (%)
Maternal age 35+ years (%)
Mean Years (sd)
Adequate prenatal care (%)
Smoking during pregnancy (%)
Drinking during pregnancy (%)
Missing smoking/Drinking (%)
Using vital statistics birth records from New York City, we have extended previous investigations of the "Hispanic Paradox", but with a novel statistical methodology that allowed for numerous improvements. The use of Bayesian hierarchical modeling allowed for the estimation and graphing of all effects on the absolute scale, which has more direct public health relevance and allows for direct comparison between groups without having to specify a referent group for the contrast parameter. The methodology also permits easy calculation of posterior intervals, whereas the calculation of variances for posterior probability estimates from multilevel logistic models in a frequentist setting would have been enormously difficult. Moreover, the shrinkage accomplished with random coefficient terms allowed for flexible modeling over relatively fine categorizations of education and ethnicity in order to produce more specific patterns than reported previously, and with the extent of shrinkage determined by the data. Finally, we reported effects for a more pathologically specific outcome, preterm birth, which represents a more homogeneous etiology (truncated gestational age) than the composite outcome of low birthweight (LBW) that has often been reported [3, 5].
We also highlight the age-adjusted rather than the fully-adjusted estimates because we argue that these are particularly important for understanding disease burden in populations, since the real situation of these women and their pregnancies is more readily revealed in the age-adjusted values. Moreover, we would argue that for etiologic inference, the covariate-adjusted estimates may be less helpful because the covariates are, except for age, arguably consequences of the primary exposures: nativity and ethnicity . Furthermore, as shown in appendix figure A5 (Additional file 2), the adjustment actually makes little practical difference to the effect measures in this instance.
Random-effects regression models have the advantage of shrinking group-specific estimates toward the adjusted education-category mean risks, which guarantees a reduced mean square error for the ensemble of results . This implies that coefficient estimates based on sparse categories in the data will "borrow strength" (i.e., shrink coefficients toward a common prior mean) from their neighbors in order to avoid erratic estimates and that estimated values are "smoothed" in order to better recognize the underlying patterns in the data . The models we employed considered the 7 ethnicity groups to be exchangeable at each nativity stratum and year of achieved education, conditional on the modeled covariates. Although this approach can be conservative, since it biases truly dissimilar values toward the group mean , it nonetheless allowed us to estimate risks for finer categorizations of both ethnicity and education level. Further, the extent of borrowing in our model is determined by the prior precision terms τ, which are estimated, in part, from the data. This adds a level of robustness to our model: when the data reflect greater heterogeneity between groups, the precision term will decrease to ensure little borrowing of information .
Previous analyses defined low education categorically as 0-11 years, and then reported a monotonic decrease in risk of adverse outcomes with increasing education [3, 5]. By using finer classification of education, particularly in the lower range, and flexible dose-response modeling, we show that the data provide some evidence against a monotonic relationship for many of the Hispanic subgroups in NYC. For example, foreign-born women of very low education (less than 8 years) are estimated to have similar risk to women completing secondary education in many of the groups. It is quite likely that education serves as a marker for acculturation, with women who report very low educational attainment having the most traditional cultural affiliation. This could provide a protective effect for birth outcomes through mechanisms such as diet, social support, and decreased risk behaviors such as substance abuse .
Additionally, plotting predicted probabilities rather than graphing relative measures of effect allowed us to look directly at risk dose-response across groups. For example, the plotting of predicted absolute risks not only reveals the substantially higher risk for Puerto Ricans, but also shows that for this group in particular, there is no apparent advantage for the "foreign born" (i.e., island-born) as there is for other ethnicities. This makes sense substantively, since all Puerto Ricans are US citizens, whether born in the mainland or on the island. Greater mobility is therefore possible between populations, with less migrant selectivity among those who relocate. The other group with little apparent effect of nativity was Cubans, although the very small number of foreign born Cuban mothers in New York City limited the precision of these estimates severely. Evidence that this is not a distinctively Caribbean phenomenon can be seen by contrasting Dominicans, whose risk profile looks very much like the other Central and South American groups.
It should be noted that because we display absolute risk estimates, the adjusted values shown must depend on the choice of level at which covariates are fixed in the analysis. We chose to set all covariates to values with lowest risk, meaning that the graphs displayed are the "best case" scenarios for each stratum defined by ethnicity and nativity; when additional risk factor are "switched on", the absolute risks will be greater than those shown.
This paper uses novel statistical methods to extend previous findings concerning risk of adverse birth outcomes for Hispanic women as a function of ethnicity, nativity and years of completed education. We confirmed the previously published findings that the education gradient is much flatter for foreign-born women, with the exception of island-born Puerto Ricans. We went beyond previous research, however, to demonstrate that more substantively interpretable analyses are possible through the use of flexible hierarchical models. Finer categorizations help reveal evidence that the benefit associated with additional years of schooling may not be monotonic, but rather that women with the lowest levels of education show reduced risk compared to those with 8-11 years. Furthermore, by displaying Bayesian posterior risk estimates and their differences, rather than ratio measures of effect, we show heterogeneity between groups that is obscured when each group is used as its own referent. These results show a consistent disadvantage for Puerto Rican women at all education levels and for all outcomes. Furthermore, results that are adjusted for measured risk factors will also be more moderate than those that occur in the real world. If Puerto Ricans also have a more disadvantaged profile of these factors, their true risks will be even more disparate.
This work was supported in part by National Institute of Child Health and Human Development (R21 HD050739). JSK was supported additionally by funding from the Canada Research Chairs program. RFM was supported additionally by National Institute of Child Health and Human Development (1U01-HD061940). The authors thank Susan Marshall Mason, PhD and Teresa Janevic, PhD for facilitating access to the dataset.
- Brown HL, Chireau MV, Jallah Y, Howard D: The "Hispanic paradox": an investigation of racial disparity in pregnancy outcomes at a tertiary care medical center. Am J Obstet Gynecol. 2007, 197 (2): 197.e1-197.e9. [http://www.ncbi.nlm.nih.gov/pubmed/17689648]View ArticleGoogle Scholar
- Ruiz RJ, Dolbier CL, Fleschler R: The relationships among acculturation, biobehavioral risk, stress, corticotropin-releasing hormone, and poor birth outcomes in Hispanic women. Ethn Dis. 2006 Autumn, 16 (4): 926-32.PubMedGoogle Scholar
- Acevedo-Garcia D, Soobader M, Berkman LF: The differential effect of foreign-born status on low-birthweight by race/ethnicity and education. Pediatrics. 2005, 115: e20-e30.PubMedGoogle Scholar
- Tumiel LM, Buck GM, Zayas LE, Jaén CR: Unmasking adverse birth outcomes among Hispanic subgroups. Ethn Dis. 1998, 8 (2): 209-17.PubMedGoogle Scholar
- Acevedo-Garcia D, Soobader M, Berkman L: Low birthweight among US Hispanic/Latino subgroups: The effect of maternal foreign-born status and education. Social Science & Medicine. 2007, 65: 2503-2516. 10.1016/j.socscimed.2007.06.033.View ArticleGoogle Scholar
- Sackett DL, Deeks JJ, Altman DG: Down with odds ratios!. Evidence-Based Med. 1996, 1: 164-166.Google Scholar
- Wilcox AJ: Intrauterine growth retardation: beyond birthweight criteria. (Editorial). Early Hum Dev. 1983, 8: 189-93. 10.1016/0378-3782(83)90001-4.View ArticlePubMedGoogle Scholar
- Saigal S, Doyle LW: An overview of mortality and sequelae of preterm birth from infancy to adulthood. Lancet. 2008, 371 (9608): 261-269. 10.1016/S0140-6736(08)60136-1.View ArticlePubMedGoogle Scholar
- Moster D, Lie RT, Markestad T: Long-term medical and social consequences of preterm birth. N Engl J Med. 2008, 359 (3): 262-273. 10.1056/NEJMoa0706475.View ArticlePubMedGoogle Scholar
- Institute of Medicine, National Academy of Sciences: Infant deaths, an analysis by maternal risk and health care. Contrasts in Health Status. 1973, Washington, DC: National Academy of Sciences, I:Google Scholar
- Greenland S: Principles of multilevel modelling. Int J Epidemiol. 2000, 29 (1): 158-67. 10.1093/ije/29.1.158.View ArticlePubMedGoogle Scholar
- Lunn DJ, Thomas A, Best N, Spiegelhalter D: WinBUGS -- a Bayesian modelling framework: concepts, structure, and extensibility. Statistics and Computing. 2000, 10: 325-337. 10.1023/A:1008929526011.View ArticleGoogle Scholar
- Kaufman JS, Cooper RS: Commentary: considerations for use of racial/ethnic classification in etiologic research. Am J Epidemiol. 2001, 154 (4): 291-8. 10.1093/aje/154.4.291.View ArticlePubMedGoogle Scholar
- Greenland S: Summarization, smoothing, and inference in epidemiologic analysis. Scand J Soc Med. 1993, 21 (4): 227-32.PubMedGoogle Scholar
- Greenland S: When should epidemiologic regressions use random coefficients?. Biometrics. 2000, 56 (3): 915-21. 10.1111/j.0006-341X.2000.00915.x.View ArticlePubMedGoogle Scholar
- MacLehose RF, Dunson DB, Herring AH, Hoppin JA: Bayesian methods for highly correlated exposure data. Epidemiology. 2007, 18 (2): 199-207. 10.1097/01.ede.0000256320.30737.c0.View ArticlePubMedGoogle Scholar
- Callister LC, Birkhead A: Acculturation and perinatal outcomes in Mexican immigrant childbearing women: an integrative review. J Perinat Neonatal Nurs. 2002, 16 (3): 22-38.View ArticlePubMedGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/11/51/prepub
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.