Skip to main content

A simulation study for evaluating the performance of clustering measures in multilevel logistic regression

Abstract

Background

Multilevel logistic regression models are widely used in health sciences research to account for clustering in multilevel data when estimating effects on subject binary outcomes of individual-level and cluster-level covariates. Several measures for quantifying between-cluster heterogeneity have been proposed. This study compared the performance of between-cluster variance based heterogeneity measures (the Intra-class Correlation Coefficient (ICC) and the Median Odds Ratio (MOR)), and cluster-level covariate based heterogeneity measures (the 80% Interval Odds Ratio (IOR-80) and the Sorting Out Index (SOI)).

Methods

We used several simulation datasets of a two-level logistic regression model to assess the performance of the four clustering measures for a multilevel logistic regression model. We also empirically compared the four measures of cluster variation with an analysis of childhood anemia to investigate the importance of unexplained heterogeneity between communities and community geographic type (rural vs urban) effect in Malawi.

Results

Our findings showed that the estimates of SOI and ICC were generally unbiased with at least 10 clusters and a cluster size of at least 20. On the other hand, estimates of MOR and IOR-80 were less accurate with 50 or fewer clusters regardless of the cluster size. The performance of the four clustering measures improved with increased clusters and cluster size at all cluster variances. In the analysis of childhood anemia, the estimate of the between-community variance was 0.455, and the effect of community geographic type (rural vs urban) had an odds ratio (OR)=1.21 (95% CI: 0.97, 1.52). The resulting estimates of ICC, MOR, IOR-80 and SOI were 0.122 (indicative of low homogeneity of childhood anemia in the same community); 1.898 (indicative of large unexplained heterogeneity); 0.345-3.978 and 56.7% (implying that the between community heterogeneity was more significant in explaining the variations in childhood anemia than the estimated effect of community geographic type (rural vs urban)), respectively.

Conclusion

At least 300 clusters with sizes of at least 50 would be adequate to estimate the strength of clustering in multilevel logistic regression with negligible bias. We recommend using the SOI to assess unexplained heterogeneity between clusters when the interest also involves the effect of cluster-level covariates, otherwise, the usual intra-cluster correlation coefficient would suffice in multilevel logistic regression analyses.

Peer Review reports

Background

Biomedical research studies often collect binary outcome data that have a multilevel structure. For example, in estimating the efficacy of a new drug on the treatment of patients with a viral infection (recovered or not), patients could be nested within the admitting hospitals. Similarly, for a public health specialist assessing the effect of a feeding intervention on the growth of a child (stunted or not), children could be nested within communities which are also nested within districts. In both examples, patients or children (level-1 units) may have little variation in background characteristics within each hospital or community (level-2 units). On the other hand, hospitals or communities may have patients or children drawn from a wide range of background characteristics, which may affect the within-hospital or within-community homogeneity. A standard logistic regression model, which assumes that the individual outcomes are independent will not consider the hierarchical structure of the data within hospitals or communities, and will consequently fail to distinguish between these two manifestations in each example. Ignoring the dependence between the outcomes may result in an exaggerated number of independent patients or children within hospitals or communities. For the strong homogeneity between patients in hospitals or children in communities, the effect would be an underestimation of the standard errors for the covariate effects, which may lead to incomplete and misleading conclusions on the association between the treatment and the viral infection or the feeding intervention and child growth [13].

Multilevel logistic regression models [4], also called hierarchical logistic models [5], are appropriate statistical techniques for analyzing clustered binary outcome data. The use of multilevel models for the analysis of multilevel data structures has exponentiated since the 1990s due to the rise in computational power [2]. The analysis of multilevel data is undertaken mostly to serve two purposes: to provide correct inferences by accounting for the variation that may exist between level-2 units, such as hospitals or communities, and to assess the effects of level-2 risk factors and the unexplained between-cluster (level-2 unit) variation in explaining differences between individuals’ (level-1 units) proneness to the binary outcome [1, 6, 7]. Several techniques in multilevel logistic regression have been proposed to explore whether the individual binary outcomes are explained by cluster-level covariates and the unexplained between-cluster heterogeneity. Because of its simplicity in interpretation, especially in linear mixed models, the Intra-class correlation coefficient (ICC) (also known as the variance partition coefficient (VPC)) is commonly used to measure the unobserved heterogeneity between clusters in multilevel models [8, 9]. In multilevel logistic regression, the ICC loses its natural interpretation which is easily understood and interpreted in the linear mixed model. Thus, several approaches have been developed to provide interpretable and meaningful information regarding the between-cluster heterogeneity and the effect of cluster-level covariates. These methods include the Median Odds Ratio (MOR), the 80% Interval Odds Ratio (IOR-80) and the Sorting Out Index (SOI) [2, 10, 11]. Though they are rarely used, they have nonetheless offered explanations for the unobserved variations between clusters and the contribution of cluster-level predictors on the individual binary outcomes [3, 12].

This paper assessed the performance of the ICC, MOR, IOR-80 and SOI in measuring the importance of cluster-level covariates and unexplained between-cluster heterogeneity in the analysis of a two-level logistic regression model. We compared the performance using simulation studies as well as an empirical application to childhood anemia data from Malawi.

Two-level logistic regression model

Suppose there are H clusters and each cluster has Nh individuals. Let Yhi be the binary outcome taking a value of 1 or 0 for each individual i in cluster h. Also, let us assume that the probability of Yhi=1 is πhi, and πhi/1−πhi is the odds of Yhi=1,(i=1,2,…,Nh;h=1,2,…,H). For each individual hi, there is a vector Xhi containing values of p predictors, including cluster-level covariates. The equation of a two-level logistic regression model is given as

$$ log\left(\frac{\pi_{{hi}}}{1-\pi_{{hi}}}\right) =\beta_{0} + \beta_{1}X_{{hi}} + u_{h} $$
(1)

Equation (1) is the conditional logistic model where πhi is the probability that each individual i in cluster h takes on the value Yhi=1 and depends on the value of the cluster random effect, uh, which captures the effect of unmeasured cluster-level variability that affects the outcome propensity. They are usually assumed to be normally distributed with a mean of 0 and variance of \(\sigma ^{2}_{h}\), and are uncorrelated with the individual and cluster-level predictor variables. In this way, heterogeneity of the binary outcome (Yhi=1) propensity between the clusters would have been accounted for. For example, increasing values of uh indicate that the associated clusters have higher propensities on the binary outcome (Yhi=1) and greater heterogeneity between the outcomes is implied by higher values of \(\sigma ^{2}_{h}\).

Intra-class correlation coefficient (ICC)

In biomedical research studies, the intra-class correlation coefficient (ICC) statistic is used in multilevel analysis to measure the extent of within-cluster homogeneity, which is caused by the multilevel structure of the data [13, 14]. The statistic is obtained by calculating the ratio of the amount of between-cluster variance to the sum of between- and within-cluster error variances, and it is computed as follows:

$$ \rho = \frac{\sigma^{2}_{u}}{\sigma^{2}_{u}+\sigma^{2}_{e}} $$
(2)

where \(\sigma ^{2}_{u}\) and \(\sigma ^{2}_{e}\) are the variances of the between-cluster and the within-cluster errors, respectively. Since adding the two variances give the total variance of the outcome variable, ρ is often interpreted as the proportion of the variance explained by clustering, or as a measure of within-cluster homogeneity. If the value of ICC is 0 then the observed outcomes are completely independent, and if the value is 1 then the outcomes are completely dependent on the cluster they belong to. Thus, the ICC ranges from 0 to 1, and using cut-off points in Koo et al. [13] and Merlo et al. [15], values of ICC of less than 0.50, between 0.50 and 0.75, between 0.76 and 0.90 and greater than 0.90 indicate low, moderate, high and very high correlations, respectively. For the linear mixed model, both the random effect and individual error terms are on the same linear scale, which makes it easier to compute the total variance in the outcome variable. However, for the multilevel logistic regression model, the cluster-level variance is on the logistic scale which cannot be directly understood in terms of variations between clusters in the prevalence of the binary outcome. Furthermore, the individual-level variance is taken from an underlying standard logistic distribution. The value of the variance for this underlying distribution is \(\sigma ^{2}_{e}=\pi ^{2}/3=3.29\). Thus, as pointed out in Merlo et al. [12] and Sanagou et al. [3], the usefulness of ICC is somehow limited, largely due to its dependence on linear mixed model concepts regarding partitioning of the total variance, its association with the prevalence of the outcome variable, and the different scales between cluster-level and individual-level error variances. Thus, for public health and epidemiological usages, explaining the partitioning of total variance and ICC for dichotomous outcomes is most challenging.

The median odds ratio (MOR)

As an alternative to the ICC-based clustering measure in multilevel logistic regression, Larsen and Merlo [10] proposed expressing the between-cluster variance on the odds ratio (OR) scale, which could then be easily interpreted as it is on the same scale on which the effects of predictor variables are measured. For the calculation of MOR, one considers a set of all odds ratios that could be obtained by comparing two individuals with identical covariates but from two different clusters, where one has higher random effect and the other with a lower random effect. The two random effect values are ordered so that the odds ratio is always at least one. The median of these odds ratios provides the Median Odds Ratio (MOR) statistic and, assuming that the random effects are normally distributed, the MOR is given as [10]:

$$ {\begin{aligned} \text{MOR}= \exp\left(\sqrt{2\sigma^{2}_{u}} \times \Phi^{-1}(0.75) \right) = \exp\left(\sqrt{2\sigma^{2}_{u}} \times 0.6745\right)\\ = \exp\left(0.95 \times \sqrt{\sigma^{2}_{u}}\right). \end{aligned}} $$
(3)

where Φ−1(0.75) is the 75th percentile of a standard normal distribution. If \(\sigma ^{2}_{u}=0\), then MOR=1, which is indicative of the absence of between-cluster heterogeneity, while if \(\sigma ^{2}_{u}\) is large (i.e. considerable between-cluster heterogeneity), the MOR will be much larger than 1 [3, 16]. Thus, the MOR measures heterogeneity between individuals belonging to different clusters.

Interval odds ratio - 80% (IOR-80)

The interval odds ratio (IOR), as developed in Larsen and Merlo [10] assesses the impact of cluster-level predictor variables on the individual binary outcomes. One starts by considering two different clusters that differ by one unit on a given cluster-level predictor variable. Two individuals with the same values for the remaining cluster-level predictor variables and all the individual-level covariates are chosen, one from each of the two clusters. Then the odds ratio (OR) for the predictor is computed, which often measures the effect of moving to a higher value of the given cluster-level predictor (for a binary cluster-level predictor variable, the OR is just the odds of it being 1 versus 0). Considering all possible pairs of individuals and their ORs, the 80% interval around the median of the distribution of the OR values provides the 80% interval odds ratio (IOR-80). Under the assumption of a normally distributed random effect, the IOR-80 has a lower limit of \(\text {IOR}_{L}=\exp (\alpha +\sqrt {(2\times \sigma ^{2}_{u})}\times (- 1.2816))\) and an upper limit of \(\text {IOR}_{U}=\exp (\alpha +\sqrt {(2\times \sigma ^{2}_{u})}\times (+ 1.2816))\) [10, 17], and values -1.2816 and +1.2816 represent the 10th and 90th percentiles of the standard normal distribution, respectively. Thus, the limits of the IOR-80 are given as:

$$\begin{array}{@{}rcl@{}} \text{IOR}_{L} &=& \exp\left(\alpha - \sqrt{\left(2 \times \sigma^{2}_{u}\right)} \times 1.2816\right), \end{array} $$
(4)
$$\begin{array}{@{}rcl@{}} \text{IOR}_{U} &=& \exp\left(\alpha + \sqrt{\left(2 \times \sigma^{2}_{u}\right)} \times 1.2816\right), \end{array} $$
(5)

where α is the regression coefficient for the cluster-level predictor of interest. If the between-cluster variance is small, then the IOR-80% interval will be narrow, and if the between-cluster variance is large, then the IOR-80% interval will be wider. IOR-80% will contain a 1 if the cluster heterogeneity is large compared to the effect of the cluster-level predictor. If the interval will not contain 1, then the effect of the cluster-level predictor (positively or negatively) is larger compared to the unexplained between-cluster variation [10]. Thus, the addition of the cluster-level predictor effect helps to compare the importance of between-cluster heterogeneity and cluster-level predictors in understanding the differences in individual proneness to the outcome, Yhi=1. However, the choice of the 80% interval width is rather arbitrary. Longer or shorter interval widths may or may not contain a 1 when the 80% width actually contains a 1.

Sorting out index (SOI)

To remedy the problem of IOR depending on the choice of the interval width, Chaix et al. [18] suggested using the percentage in the distribution of the odds ratios as described under the IOR for which the value is greater than 1. Thus, if the cluster residual variance is high, and the effect of the cluster-level predictor is not able to distinguish high-risk clusters from low-risk clusters, then the odds ratios between clusters for the cluster effect would be greater than 1 in half of the cases. On the other hand, if the cluster-level predictor has a huge positive effect compared to the between-cluster heterogeneity, then the odds ratios would be greater than 1 in almost all the pairwise cases of individuals. The percentage range of 50-100% is often used as a sorting out index (SOI) to assess and measure the extent to which a given cluster-level covariate is important compared to the between-cluster heterogeneity in explaining the individual proneness to Yhi=1.

The cluster-level covariate could be assumed to be negatively associated with the outcomes, in which case one would formulate the SOI differently. It will still be in half of the cases if it is very weak, otherwise it will be in none of the cases where the odds ratio is greater than 1; then the SOI will range between 0 to 50%. Thus, the percentage range of the SOI can be between 0% and 100% [11, 19]. Assuming the random effect is normally distributed, the sorting out index (SOI) is calculated as:

$$ \text{SOI} = \Phi\left(\frac{\alpha}{\sqrt{2\sigma^{2}_{u}}}\right) $$
(6)

The value of Φ for \(\left (\frac {\alpha }{\sqrt {2\sigma ^{2}_{u}}}\right)\) can be evaluated from a standard normal distribution table and is expressed as a percentage. This information indicates the degree to which the cluster-level predictor is of importance as compared to the unexplained between-cluster heterogeneity. An SOI of near 0% or 100% implies an absence of between-cluster variability and a high negative or positive effect of the cluster-level predictor, respectively. The SOI of 50% implies that the effect of cluster-level predictor is low [19], as compared to the unexplained between-cluster heterogeneity.

Simulation protocol

We compared the performance of ICC, MOR, IOR-80, and SOI on simulated two-level binary datasets with individual and cluster-level covariates. We then fitted two-level logistic regression models to the datasets and compared the resulting estimates to the true values by the root mean square errors.

For the simulation, the response variable, Yhi, was distributed as a Bernoulli random variable. The following logit model on the probability, πhi=Pr(Yhi=1), was used to simulate the binary responses:

$$ \text{logit}(\pi_{{hi}})= \beta_{0}+\beta_{1} X_{hi1} +\beta_{2} X_{hi2} + \alpha X_{h} +u_{h} $$
(7)

We assumed the cluster-specific random effect, uh, followed a normal distribution with a mean of 0 and a variance of \(\sigma ^{2}_{u}\) (i.e. \(u_{h}\sim N(0, \sigma ^{2}_{u})\)). The fixed effect parameters were fixed at β0=0.5,β1=−1.5,β2=0.3,α=−0.3, and both level-1 covariates XhpN(0,1) for p=1,2, and level-2 covariate XhBernoulli(0.5). For example, to generate 10 clusters with each cluster having 5 individuals with the outcome Yhi=1 or 0, the following steps were implemented;

  1. a)

    We drew 10 random effects, uh, for the 10 clusters distributed from a normal, \(u_{h}\sim N(0, \sigma ^{2}_{u})\), for example a value of \(\sigma ^{2}_{u}=0.2.\)

  2. b)

    Take the first value of the generated random effect, say uh=1.78.

    Repeat the following steps (c-f) 5 times for cluster 1 with the random effect value of 1.78.

  3. c)

    Draw Xhi1 from a N(0,1) and Xhi2 from a N(0,1) and Xh from Bernoulli(0.5).

  4. d)

    Substitute the generated values into the equation: logit(πhi)=0.5−1.5Xhi1+0.3Xhi2−0.3Xh+1.78

  5. e)

    We then solve for \(\pi _{{hi}}= \frac {1}{1+ \exp {(-(\beta _{0}} + \beta _{1} X_{hi1} + \beta _{2} X_{hi2} + \alpha X_{h}+ u_{h}))}\)

  6. f)

    We then draw Yhi from a Bernoulli with parameter πhi, which results in either 1 or 0 as a binary outcome for subject 1 in the cluster 1.

  7. g)

    Return to step b) and pick the next value, for example -0.568, as the random effect value for cluster 2. This is repeated 10 times to generate 10 clusters with known random effect values, each cluster having 5 individuals.

Here, we fixed the values for \(\sigma ^{2}_{u}\) at 0.2, 0.5, 1.0 and 1.5. The resulting true values of ICC, MOR, SOI and IOR-80 are shown in Table 1.

Table 1 True values of fixed effects and measures of heterogeneity for the two-level logistic regression, under various random effects variance values

The process was repeated for the varied number of clusters (i.e., H = 10, 50, 100, 300, 500 clusters), number of observations within each cluster (i.e., Nh = 5, 20, 50, 100, 250). For each combination of the number of clusters, cluster size, and between cluster variance, 1000 datasets were simulated. The root mean square error (RMSE) \(\sqrt {\frac {(E_{i}-O_{i})^{2}}{N}}\), where Ei was the estimated value and Oi the true value as shown in Table 1, were computed to determine their accuracies. We used the open software R (R Core Team, 2020) and packages rms (regression modeling strategies) [20], lmerTest and pbkrtest in simulating and analyzing the data.

Simulation results

Tables 2 and 3 present the accuracies of estimates of the fixed effects and measures of heterogeneity for the two-level logistic regression model for a varied number of clusters, cluster sizes and cluster variances. Tables 4 and 5 present results of estimated values of the fixed effects and measures of heterogeneity for the 2-level logistic regression model, under various numbers of clusters, cluster sizes, and cluster variances.

Table 2 Average Root Mean Square Errors for the estimated fixed effects, cluster effect, ICC, MOR, IOR-80 and SOI at cluster variances of 0.2 and 0.5
Table 3 Average root mean square errors for the estimated fixed effects, cluster effect, ICC, MOR, IOR-80 and SOI at cluster variances of 1.0 and 1.5
Table 4 Estimated values of fixed effects and measures of heterogeneity for the two-level logistic regression, under random effects variance values 0.2 and 0.5
Table 5 Estimated values of fixed effects and measures of heterogeneity for the two-level logistic regression, under random effects variance values 1.0 and 1.5

Accuracy of estimated parameters

We observe that the estimated parameters for the fixed effects were accurate for all cluster-levels and cluster variances, particularly for large clusters and cluster sizes. Lower root mean square errors were particularly observed for larger clusters, providing evidence of unbiasedness for the estimates \(\hat {\beta }\)’s and \(\hat {\alpha }\) for increased clusters and cluster sizes as shown in Tables 2 and 3. The RMSEs for \(\hat {\beta }_{0}\) were slightly biased for 10 clusters with a cluster size of 5. However, the estimates were unbiased for at least 10 clusters with a cluster size of at least 20 for all cluster-level variances. At a cluster variance of 1.0, the estimates for \(\hat {\beta }_{0}\) were unbiased for at least 100 clusters with a cluster size of at least 5. Lower RMSEs for \(\hat {\beta _{1}}\) were observed for at least 10 clusters with a cluster size of at least 5 for all cluster variances. Bias for \(\hat {\beta _{1}}\) was negligible for 100 clusters with cluster size of at least 50 at a cluster variance of 0.5. At a cluster variance of 0.2, \(\hat {\alpha }\) was slightly biased for 10 clusters with a cluster size of 5. However, we observe unbiased estimates of α for 100 clusters with cluster sizes of 50, and 500 clusters with a cluster size of at least 5. The bias of \(\hat {\alpha }\) improved with the increase in clusters and cluster size for all cluster variances.

Performance of estimated heterogeneity measures

Intra-class correlation coefficient (ICC)

Tables 2 and 3 show that the estimates for the ICC were less biased when the cluster variance was 0.2 and the estimates of the ICC were unbiased with 100 clusters and clusters sizes of 20 or more. At the cluster variance of 0.5, the ICC was estimated with less bias for at least 100 clusters of sizes 50 or larger. At level-2 variance of 1.0, the ICC was slightly biased for 10 clusters with cluster size of 5. However, the bias for estimating the ICC decreased for at least 50 clusters with cluster size of at least 20. At a level-2 variance of 1.5, the lowest RMSEs were observed for the estimation of ICC with at least 10 clusters each of cluster size of at least 20. Increasing the number of clusters (at least 100) and cluster size (at least 20) gave negligible bias for the estimation of ICC as its RMSEs decreased to almost zero, for all level-2 variances.

Median odds ratio (MOR)

As observed from Tables 2 and 3, the estimates of the MOR were largely biased for 50 clusters with cluster size of 5, for all cluster variances. at a cluster variance of 0.2, the estimates of the MOR were less biased for at least 50 clusters with cluster sizes of at least 20. The estimates of the MOR improved when cluster sizes were 50 or more for all cluster-level variances. The lowest RMSEs were observed for at least 300 clusters and a cluster size of at least 50 in estimating the MOR.

80% interval odds ratio (IOR-80)

The estimates for the IOR-80 were less accurate for small clusters and cluster sizes when the cluster variance was large as observed from Tables 2 and 3. At a cluster variance of 0.2, we observe low RMSEs when estimating the IOR-80 lower and upper limits for at least 10 clusters with a cluster size of at least 20. At a cluster variance of 0.5, the estimates of the IOR-80 lower limit were accurate for at least 10 clusters with a cluster size of at least 20. However, the estimates for the IOR-80 upper limit were largely biased for at least 10 clusters with a cluster size of at least 5 for cluster variances of 1.0 and 1.5. On the contrary, the estimates for the IOR-80 lower limit at these cluster variances (1.0 and 1.5) were accurate for at least 10 clusters with a cluster size of at least 20. The accuracy of the IOR-80 estimates were less biased with increased clusters and cluster sizes. At a cluster variance 0.2, unbiased estimates of the IOR-80 lower and upper limits were observed for 100 clusters with cluster size of at least 50. Overall, the bias for the estimates of the I0R-80 decreased for at least 300 clusters with a cluster size of at least 100 at all cluster variances.

Sorting out index (SOI)

As seen from Tables 2 and 5, the SOI estimates were biased for 10, 50, and 100 clusters with a cluster size of 5. However, the estimates of SOI were less biased for at least 10 clusters with cluster sizes of at least 20 at cluster variances of 0.2 and 0.5. Furthermore, the estimates for SOI were slightly biased for 10 clusters with a cluster size of 5. The bias in estimating the SOI decreased with increased clusters (at least 300) and a cluster size (at least 50) at cluster-level variances of 0.2 and 0.5. In addition, for large cluster variances of 1.0 and 1.5, the estimates of the SOI were unbiased with increased clusters (at least 100) and cluster size (at least 50).

Application to childhood anemia in Malawi

For the application, we analyzed childhood anemia among under-five children in Malawi using the 2015-16 Malawi Demographic and Health Survey (MDHS) data by fitting a two-level logistic regression model taking the community as a level-two unit. Childhood anemia is a major public health concern in Sub-Saharan Africa (SSA), particularly in Malawi [21]. We categorized anemia among children aged 6 to 59 months who were born five years before the survey as: no anemia = 0 and had anemia = 1.

There were 4676 children aged 6 to 59 months distributed across 850 enumeration areas (EA) which we regarded as communities. The overall prevalence of anemia was 63.1% (95% CI: 61.7, 64.5), with an average prevalence of childhood anemia of 62.2%, with a range of 0.0% to 100% and mode of 100%. The median anemia prevalence across the communities was 66.7%. The average number of children per community was 5.5, with a range of 1 to 15 and mode of 6. The median number of children per community was 5.

There are several factors that have been found to be associated with childhood anemia. These include individual-level factors: age, gender of a child, mother’s age, birth rank, wealth index, mother’s education level, father’s education level, household size, household hunger status, sanitation services, water and electricity supply, and community-level factors: community geographic type and community development index [2125]. For our application, we used mother’s education, mother’s age, child gender, household wealth quantile, and community geographic type (urban/rural). For the unadjusted model, the estimate of the between-community variance was 0.497, translating to an ICC of 0.131 and an MOR of 1.954. Adjusting for the child-level and the community-level covariates had the following odds ratios on child anemia: being female (OR= 0.92, 95% CI: 0.80, 1.04), being in a household with medium (OR= 0.71, 95% CI: 0.59, 0.84) and rich (OR= 0.78, 95% CI: 0.66, 0.93) wealth, having mothers with primary (OR= 0.69, 95% CI: 0.56, 0.86) and post-secondary (OR= 0.64, 95% CI: 0.49, 0.82) education, and residing in a rural community (OR= 1.21, 95% CI: 0.97, 1.52). For the covariate-adjusted model, the estimate of the between-community variance was 0.455. This translates to estimated values of ICC and MOR to be 0.122 and 1.898, respectively. The estimate of the ICC show that the unexplained heterogeneity between communities (i.e. low homogeneity in childhood anemia in a community) was low. While that of the MOR shows a high level of unexplained community heterogeneity (MOR »1). The estimates of the SOI was 0.567 (translated as 56.7%) and of the IOR-80 was (0.345, 3.978), implying that the unexplained heterogeneity between communities was more significant in explaining a child’s propensity to be anemic than the effect of the community geographic type (rural vs urban).

Discussion

Multilevel logistic regression models, which recognize the fact that level-1 units (individuals) are nested within higher-level units (clusters) when estimating the effect of covariates at different levels on individual binary outcomes, are widely implemented in most standard statistical software packages. They measure and evaluate the relative variation in the outcome measures, between individuals within and between clusters. However, measuring residual cluster-level variation and covariate effect heterogeneity or homogeneity (clustering) of binary outcomes within high-level clusters is more challenging in multilevel logistic regression than for linear mixed models. We set out to compare the performance of four measures of clustering in a two-level nested logistic regression model through a simulation study. The four measures were ICC, MOR, SOI and IOR-80. We applied the four measures to an analysis of childhood anemia in Malawi using a two-level logistic regression model, where the level-2 units were communities.

The accuracy of the estimated coefficients for the two-level logistic regression model improved for 100 to 500 clusters with cluster sizes of 50 to 250 for all cluster variances. These findings are similar to results from previous studies [26, 27], which showed that as the number of clusters and cluster sizes increased, the accuracy of the regression coefficients had a negligible bias. The estimates of ICC were more accurate than MOR, SOI and IOR-80 when the cluster number was small (at least 10) and when the cluster size was small (at least 20), and the bias for the ICC was negligible when the number of clusters and the cluster sizes increased. This is similar to a study by Moineddin et al. [26] which found that increasing the number of clusters and cluster sizes improved the convergence rate for the ICC.

The estimates of the SOI were more accurate than the estimates for the MOR and the IOR-80 for smaller number of clusters (at least 10) and a cluster size of at least 20 regardless of the cluster variance. However, the estimates of all the measures were more accurate when the number of clusters increased (from 100 to 500) regardless of cluster size and cluster variance. The estimates for MOR and IOR-80 performed better for large clusters (at least 300) and large cluster size (at least 20) at larger cluster variance (1.0 and 1.5). Overall the ICC and SOI estimates were better than the estimates of the MOR and the IOR-80 and the estimation of the heterogeneity between clusters using ICC and SOI was more accurate. However, the ICCs usefulness is limited as it does not account for covariates at the cluster level. Hence, the SOI would be the best statistic to measure the strength of clustering as compared to the ICC, MOR and IOR-80, especially when between cluster-level covariate heterogeneity is also of interest.

In the application to childhood anemia in Malawi, accounting for the purported risk factors, the estimated values of the ICC, and MOR were 0.122 and 1.898, respectively, which were indicative of low similarity between the status of anemia in children within a community, and high variability in unexplained heterogeneity between the communities. The estimates of SOI and IOR-80 were 56.7% and (0.345, 3.978), respectively, which showed that unexplained heterogeneity between communities was of great significance in the understanding of anemia in the children than the child and community-level predictors, namely mother’s education, mother’s age, child sex, community geographic type and household wealth quantile, used in the model, and community-level geographic type (rural/urban). [10]. Mother’s age, wealth quintile and mother’s education were significantly associated with childhood anemia, similar to findings from previous studies [21, 23, 24]. However, the effect of community geographic type on childhood anemia was not significant. With 850 communities, an average of 5 children per community and the simulation study, we recommend using the ICC and SOI to measure the strength of heterogeneity between the communities. However, since the ICC does not take into account the community-level covariate (community geographic type), we recommend using the SOI to measure the strength of variation between communities for this application.

Conclusion

In conclusion, measures of the between-cluster heterogeneity and effects of cluster-level covariates in multilevel logistic regression models would be estimated with negligible bias when the data has at least 300 clusters with a cluster size of at least 50. The estimation of the intra-class correlation coefficient and the sorting out index are accurate with 10 clusters and a cluster size of 20. We recommend using the sorting out index to assess unexplained heterogeneity between clusters as well as the effect of cluster-level covariates. The usual intra-cluster correlation coefficient would suffice when only quantification of the between-cluster variation is the purpose of the multilevel logistic regression analysis.

Availability of data and materials

The dataset used in this study are available from the DHS website https://dhsprogram.com/Data/ upon request from the MEASURE DHS program team.

References

  1. 1

    Guo G, Zhao H. Multilevel modeling for binary data. Annu Rev Sociol. 2000; 26(1):441–62.

    Article  Google Scholar 

  2. 2

    Austin PC, Merlo J. Intermediate and advanced topics in multilevel logistic regression analysis. Stat Med. 2017; 36(20):3257–77. https://doi.org/10.1002/sim.7336.

    Article  Google Scholar 

  3. 3

    Sanagou M, Wolfe R, Forbes A, Reid CM. Hospital-level associations with 30-day patient mortality after cardiac surgery: A tutorial on the application and interpretation of marginal and multilevel logistic regression. BMC Med Res Methodol. 2012; 12:28. https://doi.org/10.1186/1471-2288-12-28.

    Article  Google Scholar 

  4. 4

    Goldstein H, Vol. 922. Multilevel Statistical Models. Oxford: Wiley; 2011.

    Google Scholar 

  5. 5

    Bryk AS, Raudenbush SW. Hierarchical Linear Models: Applications and Data Analysis Methods. New York: Sage Publications, Inc; 1992.

    Google Scholar 

  6. 6

    Skrondal A, Rabe-Hesketh S. Latent variable modelling: A survey. Scand J Stat. 2007; 34(4):712–45. https://doi.org/10.1111/j.1467-9469.2007.00573.x.

    Article  Google Scholar 

  7. 7

    Snijders Roel TAB, Bosker RJ. Multilevel analysis: an introduction to basic and advanced multilevel modeling. London: Sage Publications; 1999.

    Google Scholar 

  8. 8

    Pleil JD, Wallace MAG, Stiegel MA, Funk WE. Human biomarker interpretation: the importance of intra-class correlation coefficients (ICC) and their calculations based on mixed models, ANOVA, and variance estimates. J Toxicol Environ Health B. 2018; 21(3):161–80. https://doi.org/10.1080/10937404.2018.1490128.

    CAS  Article  Google Scholar 

  9. 9

    Kul S, Vanhaecht K, Panella M. Intraclass correlation coefficients for cluster randomized trials in care pathways and usual care: Hospital treatment for heart failure. BMC Health Serv Res. 2014; 14(1):1–7. https://doi.org/10.1186/1472-6963-14-84.

    Article  Google Scholar 

  10. 10

    Larsen K, Merlo J. Appropriate assessment of neighborhood effects on individual health: Integrating random and fixed effects in multilevel logistic regression. Am J Epidemiol. 2005; 161(1):81–8. https://doi.org/10.1093/aje/kwi017.

    Article  Google Scholar 

  11. 11

    Larsen K. New measures for understanding the multilevel logistic regression model. In: Bochum Workshop on “Statistische Methoden fur korrelierte Daten,” Bochum, Germany.2006. http://www.biometrie.uni-heidelberg.de/statmeth-ag/veranstaltungen/bochum06/Vortrag_Larsen.pdf.

  12. 12

    Merlo J, Chaix B, Yang M, Lynch J, Råstam L. A brief conceptual tutorial of multilevel analysis in social epidemiology: Linking the statistical concept of clustering to the idea of contextual phenomenon. J Epidemiol Community Health. 2005; 59(6):443–9. https://doi.org/10.1136/jech.2004.023473.

    Article  Google Scholar 

  13. 13

    Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med. 2016; 15(2):155–63. https://doi.org/10.1016/j.jcm.2016.02.012.

    Article  Google Scholar 

  14. 14

    Thompson DM, Fernald DH, Mold JW. Intraclass correlation coefficients typical of cluster-randomized studies: Estimates from the robert wood johnson prescription for health projects. Ann Fam Med. 2012; 10(3):235–40. https://doi.org/10.1370/afm.1347.

    Article  Google Scholar 

  15. 15

    Merlo J, Wagner P, Austin PC, Subramanian SV, Leckie G. General and specific contextual effects in multilevel regression analyses and their paradoxical relationship: A conceptual tutorial. SSM - Popul Health. 2018; 5:33–7. https://doi.org/10.1016/j.ssmph.2018.05.006.

    Article  Google Scholar 

  16. 16

    Stapleton LM, McNeish DM, Yang JS. Multilevel and Single-Level Models for Measured and Latent Variables When Data Are Clustered. Educ Psychol. 2016; 51(3-4):317–30. https://doi.org/10.1080/00461520.2016.1207178.

    Article  Google Scholar 

  17. 17

    Larsen K, Petersen JH, Budtz-Jørgensen E, Endahl L. Interpreting parameters in the logistic regression model with random effects. Biometrics. 2000; 56(3):909–14.

    CAS  Article  Google Scholar 

  18. 18

    Chaix B, Merlo J, Gaignard J, Lithman T, Boalt A, Chauvin P. The social and spatial distribution of mental and behavioral disorders related to psychoactive substance use in the city of Malmö, Sweden, 2001. New York: Nova Science Publishers; 2005.

    Google Scholar 

  19. 19

    Merlo J, Wagner P, Ghith N, Leckie G. An original stepwise multilevel logistic regression analysis of discriminatory accuracy: the case of neighbourhoods and health. PloS ONE. 2016; 11(4):0153778.

    Article  Google Scholar 

  20. 20

    Harrell F. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. Chapter 5: Resampling, Validating, and Simplifying the Model. 2001; 3:88–103.

    Google Scholar 

  21. 21

    Ngwira A, Kazembe L. Analysis of severity of childhood anemia in Malawi: a Bayesian ordered categories model. Open Access Med Stat. 2016; 6:9. https://doi.org/10.2147/OAMS.S95159.

    Article  Google Scholar 

  22. 22

    Ngnie-Teta I, Receveur O, Kuate-Defo B. Risk factors for moderate to severe anemia among children in Benin and Mali: Insights from a multilevel analysis. Food Nutr Bull. 2007; 28(1):76–89. https://doi.org/10.1177/156482650702800109.

    Article  Google Scholar 

  23. 23

    Adedokun ST, Adekanmbi VT, Uthman OA, Lilford RJ. Contextual factors associated with health care service utilization for children with acute childhood illnesses in Nigeria. PLoS ONE. 2017; 12(3):e0173578. https://doi.org/10.1371/journal.pone.0173578.

    Article  Google Scholar 

  24. 24

    Ncogo P, Romay-Barja M, Benito A, Aparicio P, Nseng G, Berzosa P, Santana-Morales MA, Riloha M, Valladares B, Herrador Z. Prevalence of anemia and associated factors in children living in urban and rural settings from Bata District, Equatorial Guinea, 2013. PLoS ONE. 2017; 12(5):1–14. https://doi.org/10.1371/journal.pone.0176613.

    Article  Google Scholar 

  25. 25

    Legason ID, Atiku A, Ssenyonga R, Olupot-Olupot P, Barugahare JB. Prevalence of Anaemia and Associated Risk Factors among Children in North-western Uganda: A Cross Sectional Study. BMC Hematology. 2017; 17(10):1–9. https://doi.org/10.1186/s12878-017-0081-0.

    Google Scholar 

  26. 26

    Moineddin R, Matheson FI, Glazier RH. A simulation study of sample size for multilevel logistic regression models. BMC Med Res Methodol. 2007; 7(1):34. https://doi.org/10.1186/1471-2288-7-34.

    Article  Google Scholar 

  27. 27

    Hox J. Multilevel Modeling: When and Why. 1998:147–54.

Download references

Acknowledgments

HT greatly thanks the Biostatistics Research Unit at the South African Medical Research Council (SAMRC) for hosting her. She also thanks the L’Oreal UNESCO for Women in Science for the support through the 2020 Young Talents award.

Funding

SOMM was funded by the South Africa Medical Research Council (SAMRC). The funders had no role in the design, analysis, interpretation or writing of the manuscript.

Author information

Affiliations

Authors

Contributions

NA carried out the simulation studies and wrote the initial draft of the paper. SOMM conceived the research ideas including the simulation plans and extensively revised the manuscript for intellectual content. HT helped with checking the R code for the simulation studies and the revisions of the manuscript. All authors have approved the final version for submission. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Halima S. Twabi.

Ethics declarations

Ethics approval and consent to participate

For the Malawi Demographic and Health Survey (MDHS) data, ethical approval was not required as the National Statistics Office (NSO) of Malawi that conducted the survey was responsible for seeking ethical clearance on the questionnaires and protocol. The ethical clearance was granted by the Malawi Health Research Committee, Institutional Review Board of ICF Macro, Centre for Disease and Control (CDC) in Atlanta, GA, USA and Prevention IRB. Additionally, informed consent and anonymity do not apply to our study as these were done before and after data collection by the responsible implementing institution.

Consent for publication

Not applicable

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Samuel O.M. Manda is the senior author.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Adam, N.S., Twabi, H.S. & Manda, S.O. A simulation study for evaluating the performance of clustering measures in multilevel logistic regression. BMC Med Res Methodol 21, 245 (2021). https://doi.org/10.1186/s12874-021-01417-4

Download citation

Keywords

  • Heterogeneity measures
  • Clusters
  • Multilevel logistic regression
  • Childhood anemia
  • Binary outcomes