Modeling perinatal mortality in twins via generalized additive mixed models: a comparison of estimation approaches

Background The analysis of twin data presents a unique challenge. Second-born twins on average weigh less than first-born twins and have an elevated risk of perinatal mortality. It is not clear whether the risk difference depends on birth order or their relative birth weight. This study evaluates the association between birth order and perinatal mortality by birth order-specific weight difference in twin pregnancies. Methods We adopt generalized additive mixed models (GAMMs) which are a flexible version of generalized linear mixed models (GLMMs), to model the association. Estimation of such models for correlated binary data is challenging. We compare both Bayesian and likelihood-based approaches for estimating GAMMs via simulation. We apply the methods to the US matched multiple birth data to evaluate the association between twins’ birth order and perinatal mortality. Results Perinatal mortality depends on both birth order and relative birthweight. Simulation results suggest that the Bayesian method with half-Cauchy priors for variance components performs well in estimating all components of the GAMM. The Bayesian results were sensitive to prior specifications. Conclusion We adopted a flexible statistical model, GAMM, to precisely estimate the perinatal mortality risk differences between first- and second-born twins whereby birthweight and gestational age are nonparametrically modelled to explicitly adjust for their effects. The risk of perinatal mortality in twins was found to depend on both birth order and relative birthweight. We demonstrated that the Bayesian method estimated the GAMM model components more reliably than the frequentist approaches.


Background
Twins are 2-4 times more likely to die in the perinatal period compared to singletons [1]. Second-born twins, however, are known to be at higher risk of perinatal mortality than first-born twins [2][3][4]. While birthweight and gestational age are both well-known determinants of perinatal mortality [5], birthweight is more likely to be a major component of the risk difference between first-and secondborn twins because co-twins are usually delivered at the same gestational age. Moreover, second twins, on average, weigh less than first twins [6]. It is unclear if the mortality risk differences between second and first twins depend on birth order or birthweight.
Luo et al. [5] showed that perinatal mortality risk differences in second vs first twins depended on their relative birth size: risks were similar when birthweights were similar, increasingly higher as second twins weighed less, and progressively lower as second twins weighed more. However, in the conditional logistic regression model used, they controlled for the effect of small for gestational age (SGA) via a binary indicator (1 = yes; 0 = no) based on gestational age and birthweight. Controlling for a binary version of continuous confounder(s) may lead to residual confounding [7]. In this analysis, we evaluated the association of birth order with perinatal mortality after adjusting for both birthweight and gestational age, among others.
Because birthweight and gestational age may have nonlinear associations with mortality, we used generalized additive mixed models (GAMMs) [8] that employ unknown smooth functions to model nonlinear covariate effects, and random effects to account for correlation in twin-pairs. Smooth functions can be estimated in various ways [8][9][10]; here, we used penalized regression splines represented as mixed model components [11]. This allows the use of mixed model methodology and software to make systematic inference on all model components for the GAMMs.
Although several methods are available for estimating GAMMs, in practice results may vary widely depending on the method used. Motivated by the difficulties we encountered when analyzing the perinatal twin mortality, we investigate the performance of different methods via simulation in a setting similar to the twin data situation.
In this paper, we systematically compare the performance of the Bayesian and likelihood-based estimation techniques for inference in the GAMMs via a simulation study. We also apply these methods to the US matched multiple birth data to study the association between birth order and perinatal mortality by birth-order specific weight difference in twins.

Generalized additive mixed models (GAMMs)
Generalized additive mixed models (GAMMs) [8] extend generalized linear mixed models (GLMMs) [12] to allow nonlinear functional forms between independent variables and the response. They provide a flexible modeling framework to use additive nonparametric functions to model the effects of continuous covariate(s) while using random effects to model correlation between responses. Estimating the nonparametric smooth function by penalized regression splines, the GAMM can be expressed as a GLMM. Details on the GAMM and its mixed model representation are provided in the Additional file 1: Supplementary Material.

Estimation of GAMMs
The GAMM model parameters may be estimated via frequentist or Bayesian approaches. The frequentist approaches rely on approximation methods while Bayesian methods use Markov Chain Monte Carlo (MCMC). We investigate two approximation methods: double penalized quasi-likelihood (DPQL) [8,13] and the Laplace approximation [14], as well as a Bayesian approach [15].
The Bayesian methods require specification of prior distributions, a non-trivial task for variance components [16]. For the GAMM, an appropriate choice for the prior distributions of variance components is crucial because curve estimation depends on the variance components; over (under) estimation of the variance components corresponds to undersmoothing (oversmoothing).

Data and models
To study the relationship between perinatal mortality and birth order, we used the matched multiple birth dataset from the United States National Centre for Health Statistic's (NCHS) 1995-1998. For all multiple births in years 1995-1998, the NCHS data contained information on perinatal and infant mortality, and maternal and pregnancy characteristics. An extended version (1995)(1996)(1997)(1998)(1999)(2000) of this dataset was used by Luo et al. [5].
In order to estimate the odds ratio (OR) and 95% confidence interval (CIs) of perinatal death comparing second vs first twins in each stratum, we used GAMMs. This approach is broadly similar to the conditional logistic regression approach of Luo et al. [5]. We adjusted for potential confounders including fetal sex, presentation, birthweight, gestational age, and mode of delivery. Birthweight and gestational age effects were modelled nonparametrically. We did not adjust for maternal characteristics or any other factors common to a twin pair as these were perfectly matched for twins.
More specifically, conditional on twin-pair-specific random intercepts b hi~N (0, σ 2 int ), the binary outcomes Y hij (perinatal death: 0 = no, 1 = yes) in h th stratum (h = 1, …, 7) were assumed to be independent and follow a semiparametric logistic mixed model where i = 1, ..., m h indexes the twin pair in h th stratum and j = 1, 2 twins within pairs, the fixed effects covariates x hij included an intercept, birth order, fetal sex, presentation, and mode of delivery; f 1 (birthweight hij ) and f 2 (gestational age hij ) are centred twice-differentiable smooth functions of birthweight and gestational age, respectively. The random intercept variance and all other model parameters are stratum-specific and m h denotes the number of twin-pairs in the h th stratum. Note that, in our analysis, birthweight and gestational age were linearly correlated (r = 0.72). The correlations between all other covariates were negligible.

Data analysis
We fitted model (1) in each stratum in which each smooth term was represented by a penalized thin plate regression spline [18,19]. Following Ruppert [20], we considered a large number of knots (K = 20) and knot positions were evenly spaced sample quantiles of unique covariate values. Representing the penalized regression smoothers as mixed model components, and imposing the centering constraint on each smoother, we estimated the model parameters via the following methods: 1. DPQL under maximum likelihood (ML) estimation. 2. Laplace approximation. For DPQL and Laplace methods, standard errors of the estimated fixed effects and smooth functions were obtained from a posterior covariance matrix as in Lin and Zhang [8].

A Bayesian approach in which noninformative
priors were used for all parameters. Specifically, N(0, 10 6 ) distributions were used for all fixed effects (β), while half-Cauchy priors [16] with scale parameter set to 25 were considered for each variance component (e.g., for σ 2 int ). We ran 2 chains with 55,000 iterations after discarding the initial 5000 burn-in iterations. The chains were thinned by keeping every 50th iteration and estimates were the sample medians. Convergence of the chains was assessed following Gelman and Rubin [21] and also by visually examining the trace plot, density plot, and sample autocorrelation function for each parameter.
All analyses were carried out in R software employing glmmPQL [22] and gamm4 [23] functions for DPQL and Laplace approximate methods, respectively. The Bayesian analysis via MCMC was performed using JAGS [24] which is a mature and declarative language for Bayesian model fitting. Table 1 presents the summaries of selected characteristics of the twins study cohort. Most mothers were white (79.3%) and aged between 20 and 34 (74.8%). Secondborn twins had slightly lower mean birthweights (23.6 g) than first-born twins. Malpresentation was more frequent in second twins (27% vs 21%).

Results of the data analysis
The ORs of perinatal death in second versus first twins from the models fit via Laplace and Bayesian-HC are reported in Table 2. DPQL estimation failed in all strata and did not converge even for simpler models, e.g., when the effects of birthweight and gestational age were considered as linear, or, when some confounders were dropped. The Laplace method yielded more extreme ORs (away from 1) with wider CIs than the Bayesian method for all strata except for one where they were approximately equal (when second-born twins were heavier by 5 to 15%). From the Bayesian fitting, the risk of perinatal death was higher for second-born twins (adjusted OR 1.27, 95% CI: 1.13, 1.43) when the twins had similar birthweights (within ±5%) and the risk increased as the second-born twin weighed less. When second twins were heavier by 5-15% and 15-25%, the adjusted ORs were not significantly different from 1. Second-born twins were found to be at significantly lower risk of perinatal death (adjusted OR 0.33, 95% CI: 0.25, 0.45) when they weighed ≥ 25% more than the first-born twin. The variance estimates of the random intercepts are shown in the last two columns of Table 2. Estimates obtained from the Laplace method were implausibly large (up to thirty times bigger than those of the Bayesian method), possibly because the Laplace approximation performs poorly for binary data with small cluster size (n i = 2) and very low event probability, or, because the method did not converge well without reporting any warning. The Bayesian estimates of the variance of random effects indicated large heterogeneity between twin pairs (nearly 5 in most strata). Figure 1 shows the estimated nonparametric functions of birthweight and gestational age for the Laplace and Bayesian-HC methods when birthweights for first-and second-born twins were similar. The 95% pointwise credible intervals of the curves are displayed only for the Bayesian method. The estimated curves suggested different trends especially for birthweight. The Bayesianestimated curve indicated an increased risk of perinatal mortality at extreme birth weights (< 1000 g and > 5000 g). The Laplacian fit showed a decreasing trend for (mortality) risk as birthweight increased. The risk of perinatal death estimated by the Bayesian method declined sharply up until 28 weeks of gestational age and declined gradually thereafter although there was a slight  increase in risk after 40 weeks. The Laplacian fit did not closely reproduce the behaviour of the Bayesian fit. The fitted curves obtained from other strata were similar to those shown in Fig. 1.
In summary, the estimated ORs by different methods disagreed by a noticeable margin; the shape of the nonlinear associations varied widely, one method failed to converge, and the variance component estimates differed markedly. Because it was unclear which estimates should be reported, we conducted a simulation study to investigate the performance of the Bayesian and frequentist approaches for estimating GAMMs under twin data setting.

Data generation: mimicking twin-pairs data setting
The simulation study was broadly designed to mimic the twin-pairs data setting where the cluster size was 2. For all settings, data were generated following Lin and Zhang [8].
Each dataset was generated with m = 1000 clusters of homogeneous size n i = 2. We considered a random intercepts model with four variables: a binary outcome (Y), a dichotomous variable (D) such as birth order, and two continuous covariates (X 1 , X 2 ) such as birthweight and gestational age. The dichotomous variable, D was generated from a Bernoulli distribution with probability equal to 50%. Two linearly correlated standard uniform U(0, 1) covariates (X 1 and X 2 ) were generated from the Gaussian Copula such that the point-biserial correlations between the dichotomous variable and continuous covariates were ρ d.x1 = ρ d.x2 ≈ 0.30. The empirical correlation between the continuous covariates was considered as ρ x1. x2 = 0.7, which is similar to the observed correlation between birthweight and gestational age observed in the twin-pairs data. Conditional on the cluster-specific random intercepts b iÑ (0, 0.75), the binary responses Y ij within each cluster were generated with conditional probabilities: where p ij ¼ E½Y ij jb i , β 1 = 0.7 (which gives OR ≈ 2) and Γ(.) is a gamma function. The overall prevalence of a positive (Y = 1) outcome was kept at either 0.05 or 0.5 by setting β 0 to either − 1.28 or − 0.35. Here f 1 (x 1 ) and f 2 (x 2 ) were bimodal and unimodal functions, respectively. We centered the functions so that the means of f 1 and f 2 were 0 over the distinct values of X 1 and X 2 .

Analysis of the simulated data
For each setting, 1000 simulated datasets were generated. Each dataset was analysed by fitting a logistic additive mixed effects model of the form (2) in which each smooth term was represented by penalized thin plate regression spline similar to twin data analysis. The model parameters were estimated using the following methods: 1. DPQL under maximum likelihood (ML) or restricted ML (REML) estimation. 2. Laplace approximation. For DPQL and Laplace methods, standard errors of the estimated fixed effects and smooth functions were obtained following the same procedure as for the twin data analysis. 3. A Bayesian approach similar to those used for perinatal mortality data analysis with β~N(0, 10 6 ) but considering three alternative independent prior specifications for each variance component: (i) Uniform (0, 100); (ii) Half-Cauchy with scale parameter set to 25; and (iii) Inverse Gamma (0.001, 0.001). Using priors (i)-(iii), Bayesian methods are referred to later as, respectively: Bayesian-UNIF, Bayesian-HC, and Bayesian-IG. The Bayesian estimates were medians from 55,000 iterations of the MCMC algorithm after discarding the first 5000 iterations as burn-in. We ran a single chain and thinned it by keeping every 50th iteration.

Performance indicators
We computed percentage relative biases (PRBs) of the fixed and random effects estimators defined as PRB ¼ Bias True Value Â 100: For the estimated smooth functions, we computed pointwise mean average squared distance/error (MASE) from the true curves, the 95% pointwise mean average coverage probabilities (MACPs), and 95% pointwise mean average confidence interval lengths (MACLs). The pointwise MASE was defined as the mean over the 1000 replicated datasets of the average squared error, The 95% pointwise MACP and MACL were obtained as the means of the 1000 average coverage probabilities (ACP) and average credible intervals lengths (ACL), respectively. We defined where 1(.) denotes an indicator function;f L andf U are the lower and upper limits of the point-wise CI, respectively. Table 3 presents simulation results when different methods of estimating GAMMs were used. For low (5%) event probability setting, model components were better estimated by Bayesian methods than the two frequentist approaches with lower percent relative bias and MSE, shorter distance between the true and estimated curves, and higher coverage probability. Of the Bayesian methods, Bayesian-HC performed best in estimating most of the model components. Both DPQL (ML and REML) and Laplace methods yielded inflatedσ 2 int and showed poor ability in recapturing the true functions. DPQL (ML and REML) failed to converge often (66% of total datasets) whereas the Laplace method had negligible convergence problems (2% of total datasets), and results from these datasets were excluded. The Bayesian methods did not report any convergence problems. The Bayesian-UNIF method overestimated σ 2 int whereas Bayesian-IG underestimated it. The fixed effect β 1 was better estimated by Bayesian methods with negligible bias and lower MSE. The Bayesian-HC appeared to perform in between the Bayesian-UNIF and Bayesian-IG methods in terms of estimating all model components. Overall, the Bayesian estimates were sensitive to the prior specifications of the variance component. The simulation results suggest that the Bayesian-HC is likely to be the best approach in the twins-pair data, where the event rate of interest is < 5%. Increasing the event rate from 5 to 50% resulted in better estimates by all methods without any convergence problems. The superiority of the Bayesian methods however prevailed. Figure 2 illustrates the ability of the GAMMs estimated by different methods to recapture the true functions when the event probability was 0.05. The upper panel of Fig. 2 shows the true curves f j (x j ) (j = 1, 2) and the smoothed estimated curvesf j ðx j Þ. The Bayesian methods recovered the true curves well with slightly negative biases when curvature was high and also in the flat areas. The fitted curves obtained by Bayesian methods using three different priors for variance components were similar. However, around the peak area the Bayesian-HC method performed better than others. Bayesian-UNIF and Bayesian-IG yielded almost identical curves and hence the curves obtained by Bayesian-IG have not been displayed. The frequentist methods, in contrast, could not adequately recapture the true curves throughout the range. DPQL (ML) performed worse than the Laplace approximation method. Table 3 Estimate, 95% confidence/credible interval (CI), mean average squared distance (MASE), mean average 95% coverage probability (MACP), and mean average coverage length (MACL) for model parameters estimated via various approaches when number of clusters m = 1000, cluster size n i = 2, and ρ x1: For the Bayesian method, the three alternative priors used for the variance components were: uniform (0, 100), half-Cauchy (25), and inverse gamma, IG(0.001, 0.001)

Results of the simulation study
The lower panel of Fig. 2 compares the empirical pointwise coverage probabilities of the 95% credible intervals of f 1 (x 1 ) and f 2 (x 2 ) obtained using different estimation methods. The coverage probabilities (CP) of the CIs from all Bayesian methods were close to the nominal value (95%), except when biases in the estimated nonparametric functions were noticeable. In contrast, CIs from DPQL (ML) and Laplace methods yielded very low coverage probabilities, and DPQL (ML) had relatively poor coverage probabilities (mean CP 33%) compared to the Laplace method (mean 73%). Around the peak areas, CIs from DPQL (ML) and Laplace methods yielded very low coverage probabilities. In such cases, coverage probabilities from the fully Bayesian methods were also low, but nonetheless much better than frequentist methods.

Discussion
We re-analyzed twins perinatal mortality data to study the association between birth order and perinatal mortality by adopting the flexible GAMMs in which continuous covariates (birthweight and gestational age) were nonparametrically modelled to adjust for their effects more completely. Overall, how best to estimate flexible regression curves when the outcomes are correlated and binary is unclear, especially when cluster sizes are small. Thus, we analyzed twins data estimating GAMMs by different frequentist and Bayesian methods, and used simulated data to compare the performance of these estimation techniques for a setting similar to the twin-data.
Using the multiple matched data from the US National Centre for Health Statistic's (NCHS) 1995-1998, we obtained results that varied with respect to the estimation methods. Our simulation results for small cluster size (n i = 2) with low event probability (similar to the NCHS data) suggested the superiority of the Bayesian method in estimating all model components, especially using the Half-Cauchy (HC) priors for the variance components. We thus rely on the results from the Bayesian-HC fit for our data analysis. These results suggest that the risk of perinatal mortality depended on the twins' birth order smoothed pointwise coverage probabilities of the 95% confidence intervals (f 1 (x 1 ) and f 2 (x 2 ) in the lower panel) for 1000 replicated datasets. These results are for the data generation scenario with m = 1000, n i = 2, σ 2 int = 0.75, ρ x1:x2 = 0.7 and event probability = 0.05. The curves estimated by Bayesian-IG were almost similar to those obtained by Bayesian-HC and have not been displayed here to make other fits more visible and the risk differences in second vs first twins depended on their relative birthweight. Second twins were more likely to die than first-born co-twins when they had similar (within ±5%) birthweights (adjusted OR = 1.27, 95% CI: 1.13, 1.43). The risks of perinatal death for second-born twins were progressively higher as they weighed less than first-born twins (adjusted ORs: 1.39, 1.97 and 3.42 when weighed 5-15%, 15-25% and ≥ 25% less, respectively) and increasingly lower as they weighed more (adjusted ORs: 1.19, 0.91 and 0.33 when weighed 5-15%, 15-25% and ≥ 25% more, respectively; most of the ORs were significantly different from 1). Similar to the simulation results, the Bayesian analysis using uniform priors for variance components (Bayesian-UNIF) yielded slightly larger ORs whereas using inverse gamma priors (Bayesian-IG) yielded slightly smaller ORs as compared to the Bayesian-HC method (see Additional file 1: Table S1 for the results).
The effect of relative birthweight was also confirmed by Luo et al. [5] but they did not find any significant association between birth order and perinatal mortality when both twins had similar (within ±5%) birthweights (OR = 0.97, 95% CI: 0.84, 1.12). Also, the ORs they obtained from the stratified analyses were closer to 1 in most cases. This may be due to using different models, or adjusting for different sets of confounders. They used a binary indicator 'small for gestational age' to control for the effect of birthweight and gestational age, which might lead to residual confounding [7].
Similar to the findings from the simulation study, the Laplace estimate of the variance of the random intercepts in each stratum was unusually large -indicating an extreme heterogeneity between twin pairs. The fitted smooth curves for birthweight and gestational age by the Laplace method were less likely to capture the true shapes of association due to the poor estimates of the variance components. The curve estimation largely depends on the estimates of the variance components in a GAMM, and the Laplace method yielded poor estimates of the variance components as evident from the simulation study. The DPQL method failed to fit the model in each stratum and this was in agreement with the findings from the simulation study in which DPQL failed to converge often.
The observed performance of the DPQL and Laplace approximation in estimating the model components in the simulation study was not surprising as they are known to yield biased estimates for small cluster size. However, we demonstrated the strength of Bayesian methods when the system was stressed, i.e., when cluster size and event probability were small. While using Frequentist methods with more refined likelihood approximation (e.g. adaptive Gaussian quadrature) may improve performance but is not feasible as the mixed model representation of GAMMs involves a large number of random effects and Gaussian quadrature is not computationally efficient for more than four random effects [25].
There are some limitations to this study. First, we analyzed the NCHS 1995-1998 twin matched data that we had access to. Unfortunately, the updated version of the data from 1995 to 2000 was not publicly available during this analysis. We do not believe that the results would have changed appreciably given two more years of data. Next, we considered a stratified analysis for twin-data analysis, although a single model that included an interaction term for birth order and relative birth size might be more appropriate to study their association with perinatal mortality by estimating these effects using the whole dataset at once. Stratification was used to make our results comparable to Luo et al. [5], and to reduce the computational resources required for handling a huge dataset. Finally, we omitted a potential confounder, zygosity (Monozygotic-MZ/Dizygotic-DZ), because the data was not available. MZ twins are likely to be more correlated both for birthweight and for potential mortality than DZ twins.

Conclusion
We adopted a sophisticated statistical model, GAMM, to precisely estimate the perinatal mortality risk differences between first-and second-born twins from a large dataset in which birthweight and gestational age were nonparametrically modelled to explicitly adjust for their effects. Overall, the perinatal mortality risk differences in second vs first twins were found to depend on both birth order and relative birthweight. We demonstrated that the Bayesian method (especially using half-Cauchy prior for variance component) estimates the GAMM model components more reliably than the frequentist approaches for small cluster size.
Additional file 1: Supplementary Material: The Mixed Model Representation of the GAMMs. This file contains an explanation on the generalized additive mixed models (GAMMs) and their representation as generalized linear mixed models (GLMMs). In addition, it includes a table that summarizes additional results from the twin perinatal mortality data analysis. Table S1. Stratified comparisons of second and firstborn twins: adjusted ORs of perinatal death obtain from the Bayesian fit of the logistic additive mixed effects models.