Overcoming the problems caused by collinearity in mixed-effects logistic model: determining the contribution of various types of violence on depression in pregnant women

Background Collinearity is a common and problematic phenomenon in studies on public health. It leads to inflation in variance of estimator and reduces test power. This phenomenon can occur in any model. In this study, a new ridge mixed-effects logistic model (RMELM) is proposed to overcome consequences of collinearity in correlated binary responses. Methods Parameters were estimated through penalized log-likelihood with combining expectation maximization (EM) algorithm, gradient ascent, and Fisher-scoring methods. A simulation study was performed to compare new model with mixed-effects logistic model(MELM). Mean square error, relative bias, empirical power, and variance of random effects were used to evaluate RMELM. Also, contribution of various types of violence, and intervention on depression among pregnant women experiencing intimate partner violence(IPV) were analyzed by new and previous models. Results Simulation study showed that mean square errors of fixed effects were decreased for RMELM than MELM and empirical power were increased. Inflation in variance of estimators due to collinearity was clearly shown in the MELM in data on IPV and RMELM adjusted the variances. Conclusions According to simulation results and analyzing IPV data, this new estimator is appropriate to deal with collinearity problems in the modelling of correlated binary responses.


Introduction
Intimate partner violence (IPV) against women is one of the major public health challenges in the world [1]. IPV is categorized into mental, physical, sexual, and financial types [2]. IPV can cause physical problems including bruising, fractures, trauma, and various sexually transmitted infections. It can also cause mental health problems in women, such as depression, anxiety, and even suicide [3]. There are many women who may experience depression during pregnancy and the risk of it increases under IPV [4,5]. Some interventions may be useful to reduce odds of depression in pregnant women under IPV. For analyzing these longitudinal studies with binary responses, mixed-effects logistic model (MELM) is used as a common model. Usually, this method estimates the fixed parameters based on maximum likelihood and uses the adjusted Gauss-Hermite to approximate the integral related to random effects [6,7]. Modeling of the correlated binary responses may suffer from some problems in modeling like collinearity [8].
Collinearity is referred to the linear relationship between predictor variables. The inherent relationship between variables in the real world, small sample size, design of model, and the trend of predictor variables can cause collinearity [9]. Indeed, the issue that makes collinearity an important problem in modeling is variance of estimators. When there is collinearity, determinant of X T X becomes small, where X is design matrix, leading to an inflation in variance of estimators. Bias in decision on predictor variables and wide confidence interval length are other consequences of collinearity. In addition, collinearity makes the effects of predictor variables inseparable and it may be difficult to evaluate relative importance of each predictor variable [8,10,11].
There are some simple methods to deal with collinearity. Drawing back collinear variables, centering predictor variables, and using dimension reduction methods like principal component analysis are some of these solutions. But, it should be mentioned that despite simplicity, each of which has their own disadvantages [12,13]. Ridge estimator is one of the methods shown a desirable effect against consequences of collinearity. In this method, the penalized log-likelihood is used with ridge penalty. Then, ridge estimator imposes some bias to estimator, by adding a constant value in the main diagonal of X T X but decreases its variance. In fact, it is a tradeoff between bias and variance [11,12,14,15].
Various studies have been conducted to compare performance of ridge, lasso, and Firth penalties. For instance, studies have shown that problems can arise if lasso penalty is applied instead of ridge penalty in the presence of collinearity. The first problem is variable selection. In the presence of collinearity between variables, lasso method randomly removes one variable from the model. The second problem is prediction accuracy of the model. Prediction accuracy of lasso method is less than ridge [16] and mean square error of ridge method is less than lasso method [17,18]. Also, in the presence of separation, the use of Firth penalty compared to ridge leads to more accurate estimates [19].
MELM uses maximum likelihood estimator (MLE) for estimation of the fixed effects. So, inflation in variance of the estimator and lack of significance in important variables may occur. Due to the increase in the number of studies with the correlated binary responses, such as longitudinal and cluster studies, in this paper, a ridge estimator is proposed in the correlated binary responses based on Fahrmeir and Tutz method [20,21]. Herein, the details on method and estimators are introduced in Method section. Analysis of IPV data and simulation study are presented in Numerical study section. Finally, discussion of findings and conclusions are provided in Discussion and conclusions section.

Method
Suppose y ij determines the jth observation for ith individual, i = 1, 2, ..., n, and j = 1, 2, ..., n i . MELM is defined as: where, x ij and z ij are observation vector for fixed and random effect for ith individual in j observation, respectively. X and Z are the design matrix for the fixed and random effect. Vector of the fixed and random effect are denoted by β β β p×1 and b i . Penalized log-likelihood with Breslow and Clayton integral approximation for model 1 is in the form of 2.
To estimate the variance component, the EM algorithm is used. The estimation of variance is: where

Shrinkage parameter
The shrinkage parameter was obtained through λ = Here, m k = σ 2 α 2 k , andα k is the kth element of γβ β β and γ is eigenvector such that X TŴ X = γ T γ as is a diagonal matrix with eigenvalues of X TŴ X [22,23]. A study showed that this method works well in reducing MSE [24]. Also, this method has the closed-form, so it saves computation time. Therefore, it was chosen as an estimator for the shrinkage parameter.

Hypothesis testing about regression coefficients
For testing regression coefficients obtained through maximum likelihood, it is possible to use square root of the main diagonal elements of Fisher information matrix as standard errors of regression coefficients. Then, test statistic is as follows: This test statistic follows t-distribution. For the penalized maximum likelihood estimators, this test statistic has no longer t-distribution. Some studies have proposed a non-exact t-test for linear ridge regression and logistic ridge regression [25,26]. For logistic ridge regression, it is as follows: . Then, the test statistic is: In this study, the last step of each iteration to estimate the fixed effects uses the Fisher-scoring, so the variance which used in non-exact t-test is:

Intimate partner violence
In this study, 150 pregnant women referring to health centers in suburbs of Hamadan City (Hamadan Province, Iran) who were under IPV were selected. The study was approved by the ethics committee. This study was conducted in accordance with the Declaration of Helsinki. These women were assigned to control and intervention groups. For the intervention group, 5 public health education sessions were held by a clinical psychologist for 5 weeks. Identifying factors causing IPV and how to manage it, forming support groups of participants, being in contact with the consultant, providing management solutions for these people, increasing communication skills of participants, giving booklets containing conflict management techniques, gift cards, and providing a free counseling session for husbands of these women were a summary of the plans administered in the intervention group.
Before starting the study, a general mental health questionnaire (GHQ) was given to all the participants. At the end of the study, these people again completed this questionnaire. Finally, after data collection, it was attempted to determine effectiveness of the intervention and contribution of various types of violence in psychological aspects of these women. Depression is an important problem in these women. Here, depression was considered as the response variable. Women with depression received a value of 1 and the others received a value of 0. So, the main aim of analysis was assessing effectiveness of the intervention and the effect of types of violence on depression.
At first, types of violence were considered as a matrix, called as V. Then, correlation matrix of V was obtained, namely cor(V ). As can be seen in cor(V ), there are medium to high correlations between variables. As shown in the cor(V ), there is a warning for the presence of collinearity between these predictors, because most of correlations are above 0.5 [8]. For achieving more assurance about the existence of collinearity, the condition index was computed. This value was equal to 9.8, indicating collinearity between these variables. For modeling, time, intervention, and types of violence were considered. So, the design matrix, X, defined as X = [ Intervention, Time, V ]. Condition number for this matrix was 14.9 which is shows collinearity is a concern. At first, MELM was fitted to these data regardless of collinearity. Then, our proposed model was fitted.
To conducting the global test for the null hypothesis that all of coefficients is simultaneously zero, the likelihood ratio test (LRT) was used. For this data in MELM, the LRT = 293.91 and p − value = 0.009. This test indicates that all of coefficients is not simultaneously zero. As shown in the first part of Table 1, due to collinearity between predictors, none of predictors is significant at 95% of significance level. Only, psychological violence had a significant effect on depression at 90% of significance level. As can be seen in Table 1, inflation in standard errors is quite obvious. The second part of Table 1 shows the results of our proposed model. As shown in Table 1, standard errors of RMELM are lower than those of the MELM. The standard errors became adjusted and all of variables became significant. The estimated variance of random effects was equal to 1.12 and 1.26 in MELM and RMELM, respectively. According to the results presented in Table 1, the odds of depression in the control group were 55% higher than the intervention group. Also, the odds of depression were decreased by increasing time so that, odds of depression at time 1 were 2.3 times compared to time 2. Among types of violence, financial violence increased the odds of depression more than other types so that, the odds of depression were increased by 2.37 times in women with the increase in financial violence. After that, sexual violence increased the odds of depression, so that the odds of depression were increased by 90% by increasing sexual violence. As physical violence was increased, the odds of depression were increased by 47%. Finally, as psychological violence was increased, the odds of depression were increased by 17%. All of these factors were significant (p − value < 0.0001).

Simulation study
For assessing performance of the proposed RMELM, a simulation study was designed and conducted under different settings. Sample size, degree of collinearity between predictors, and correlation between responses were items which considered in the simulation. Here, η ij = x T ij β β β + z T ij b i was generated with true values for β β β, where, β β β T = (0.2, 0.4, −0.3). Because, there must be collinearity between predictor variables, the correlation between these variables was considered as ρ = (0.7, 0.8, 0.9, 0.95). The predictor variables were generated through x ijk = (1 − ρ) 1 2 a ijk + ρ 1/2 a ijk , where, i = 1, 2, ..., n, j = 1, 2, k = 1, 2, and a ijk were generated from standard normal distribution. For investigating the effect of correlation between responses, the intraclass correlation coefficient (ICC) was also considered as ICC = (0.2, 0.5, 0.8). RMELM and MELM were compared. MELM was obtained through glmer in lme4 package [27] in R. For assessing performance of these models, relative bias, mean square error (MSE), and empirical power for fixed effects, and variance of random effects were used.

Discussion and conclusions
In this study, RMELM was introduced for correlated binary responses, and this model was compared with MELM. Table 2 shows the comparative results of MELM and RMELM in terms of MSE and relative bias. For β 1 , at n = 30 and ICC = 0.2 , MSE for fixed effect estimator in MELM was increased by increasing correlation so that, this value was increased by 2.24 times at correlation level of 0.95 compared to 0.7. At n = 50 compared to smaller sample size, MSE of fixed effect estimator in MELM was relatively smaller and for n = 100, this value was also decreased. With the increase in ICC, MSE for fixed effect    MSE of fixed effect estimator for β 3 in RMELM was less than that of β 1 and β 2 . This value was smaller than that of MELM. Median of relative bias was 50% for fixed effect estimator in RMELM. Table 3 shows empirical power for these estimators. The empirical power of MELM was very small for β 1 , and  Table 4 provides estimates regarding variance of random effects for MELM and RMELM. As ICC increased, variance in random effects was increased in both models. With the increase in sample size, the variance was decreased for MELM. In this study, two methods were used to investigate the effect of different types of violence on depression in pregnant women under IPV. Due to collinearity between types of violence, for MELM, none of the predictor variables was significant at 95% of significance level and only one predictor variable was significant at 90% of significance level. Using the new method, the effect of all types of violence (financial, sexual, physical, and psychological) on depression was significant. These findings illustrate how collinearity influences the results of longitudinal studies with binary responses. The results obtained by the new estimator were consistent with the other previous studies in this area. It has been demonstrated that financial violence influenced depression in Brazilian pregnant women [28]. Physical violence has been also shown to affect the depressed married women in Korea [29]. Results of a study conducted in Tanzania revealed that emotional, physical, and sexual violence affected women's depression [30].
The results of the simulation study showed that the new model has a lower MSE for fixed effects than the MELM. The new model also increased the empirical power well. Also, in numerical study, inflation in variance of fixedeffects in MELM was shown in the MELM, and a better estimation was made using RMELM.