LEVEL (Logical Explanations & Visualizations of Estimates in Linear mixed models): recommendations for reporting multilevel data and analyses

Background Researchers have been utilizing linear mixed models (LMMs) for different hierarchical study designs and under different names, which emphasizes the need for a standard in reporting such models [1, 2]. Mixed effects models, multilevel data, contextual analysis, hierarchical studies, longitudinal studies, panel data and repeatedmeasures designs are some of the different names used when referring to study designs and/or analytical tools for correlated data. In addition, there is usually no distinction made between having a data structure that is multilevel, and having a research question that requires a multilevel analysis. There are multiple excellent tutorials on multilevel analyses [3–5]. However, there is inconsistency in how the results of LMMs are reported in the literature [6]. Casals et al. conducted a systematic review of how various LMMs were reported in the medical literature, and found that important aspects were not reported in most cases [6]. As an example, a cohort study of children that selects a sample of schools, then selects students within schools, and conducts multiple measurements over time in the same students, would be a 3-level dataset: with school as the highest level (Level 3), student as a lower level (Level 2), and time-point as the lowest level (Level 1). Repeated measurements of a variable over time within a student are likely to be similar, i.e. positively correlated. Also, values of a variable measured on students of a particular school may be more similar to each other than to the


Background
Researchers have been utilizing linear mixed models (LMMs) for different hierarchical study designs and under different names, which emphasizes the need for a standard in reporting such models [1,2]. Mixed effects models, multilevel data, contextual analysis, hierarchical studies, longitudinal studies, panel data and repeatedmeasures designs are some of the different names used when referring to study designs and/or analytical tools for correlated data. In addition, there is usually no distinction made between having a data structure that is multilevel, and having a research question that requires a multilevel analysis. There are multiple excellent tutorials on multilevel analyses [3][4][5]. However, there is inconsistency in how the results of LMMs are reported in the literature [6]. Casals et al. conducted a systematic review of how various LMMs were reported in the medical literature, and found that important aspects were not reported in most cases [6].
As an example, a cohort study of children that selects a sample of schools, then selects students within schools, and conducts multiple measurements over time in the same students, would be a 3-level dataset: with school as the highest level (Level 3), student as a lower level (Level 2), and time-point as the lowest level (Level 1). Repeated measurements of a variable over time within a student are likely to be similar, i.e. positively correlated. Also, values of a variable measured on students of a particular school may be more similar to each other than to the values of the same variable measured on students from different schools, i.e. they are also likely to be positively correlated. These within-level correlations reduce the overall information in the data. Considering the correlations typically leads to larger estimates of variances and consequently lower power if sample sizes are not increased at the design stage. At the analysis stage, incorporating random effects into a regression model is one way to acknowledge the variation among upper-level units. Random intercepts and random slopes help to attribute the variation in values of the outcome variable to the relevant levels and independent variables.
A standardized checklist for the reporting of multilevel data and the presentation of linear mixed models will promote adequate reporting of correlated data analyses. In this manuscript, we propose LEVEL (Logical Explanations & Visualizations of Estimates in Linear mixed models), a systematic approach for the presentation of studies with correlated data from multilevel study designs, with an accompanying checklist for standardizing the reporting of results from linear mixed models. These models are quite complex, and the intention of this manuscript is not to be a statistical tutorial, but to mention aspects of the study design and analysis methods that we propose should be addressed in a publication. We present the basics of a linear mixed model simply to introduce the terminology and to help understand the proposed reporting recommendations.
where i = 1, …,m indexes the number of upper-level units, j = 1, …, n i indexes the number of base-level units in the i th upper-level unit, μ denotes the overall mean of the dependent random variable Y, τ i is the random intercept effect of the i th upper-level unit, and ε ij is the random error of the j th lower-level unit in the i th upper-level unit. We assume Normal distributions for the random effects, such that τ i Nð0; σ 2 I Þ and ε ij Nð0; σ 2 E Þ , where σ 2 I is the component of variation due to variability among upperlevel units, and σ 2 E is the residual component of variation due to variability among lower-level units. We assume that these two random effects are independent of each other.
By acknowledging multiple sources of variability and then attributing the variation to the appropriate level, the multilevel model can more accurately and precisely estimate the effects of all variables included in the model [7]. Variance components are used to calculate the "intra-level" or intraclass correlation coefficient (ICC), a statistic that quantifies the degree to which data at the lower level are correlated. The ICC, also referred to as the variance partition coefficient (VPC), is calculated by the following proportion, ; which helps answer the question: of the total variation in the outcome variable, how much is accounted for by the variation among the upper-level units? As the term ICC is often mistaken for an estimate of a correlation coefficient, we will use the more appropriate term VPC. A VPC close to 0 suggests that little to no variation in the outcome is attributable to variation among upperlevel units, so most of the variation in the outcome is among the lower-level units and thus there is little correlation among them. On the other hand, a VPC close to 1 suggests that most of the variation in the outcome is attributable to variation among upper-level units, so little variation is to be found among the lower-level units; thus, there is high correlation among them. Calculating the VPC can help determine the presence of correlation at the lower level and the need to account for it in the analyses. Interpretation of the magnitude of the ICC/ VPC is context dependent.
In hierarchical data structures with more than 2-levels (see multilevel diagram in Fig. 1), the VPC can be calculated for outcomes measured on units of each lowerlevel, with the numerator as the variation in outcome between units on all levels above [8]. For the example in Fig. 1, if we have the following 'null' model for the observation at time t on the j th pupil from the i th school, then VPC 1 quantifies the correlation among all the values between and within pupils nested within schools and is given by while VPC 2 quantifies the correlation among the repeated measurements within pupils nested within schools and is given by where σ 2 I is the component of variation due to variability among schools, σ 2 J is the component of variation due to variability among pupils nested within schools, and σ 2 E is the component of residual variation due to variability in the repeated measurements within pupils.
Understanding the implications that correlations among observations may have on the design and analyses of research studies is essential. At the design stage, if the contribution to the VPC for a particular level (the variance component) is small, it implies that there is little variation among units at that level; it is therefore more advantageous to sample more units from higher levels from an efficiency and power standpoint. These important statistical considerations in planning sample sizes at the different levels are accounted for with the variance inflation factor (VIF), also called the 'design effect'. For a given level, k, the VIF is [1 + (m k -1) VPC k ], where m k is the average number of units in a member of the k th level.
At the analysis stage, depending on the study design, linear mixed models can include random effects to account for correlation in space or in a social group (clustering), time (repeated-measures), or both. Table 1 presents example linear mixed models with dependent variable Y in hypothetical 2-level and 3level study designs, with a single independent variable X. If the data were from a 1-level study design, the model would have no random effects (except the residual error!): The random effects applied in the simple linear mixed models in Table 1 are assumed to have Normal distributions and to be independent from the error distribution. If there is more than one random effect, one must also specify if they are independent amongst themselves, and if not, should specify the covariance structure amongst the random effects.
The statistical literature is confusing and contradictory as to whether to consider effects as fixed or as random [9]. Many textbooks state that level effects must be considered as fixed effects if all possible members of that level were studied, and as random effects if members of that level are a sample from some population. Others state that fixed effects are to be used if the specific member effects are of interest, and as random effects if not. The Hausman test for the difference between the within-level and between-level regression coefficients is sometimes used as a test for deciding whether to use a random or fixed coefficient model [10]. We are not stating a position on this argument, but insist that one must acknowledge the hierarchical study design, not ignore the correlations, and justify the random intercepts and random slopes used.

Multilevel data versus multilevel research question
The first step in analyzing multilevel data is to decide if the research question is a multilevel question. The design of a study may be hierarchical and thus have correlated data, but the research question may be one that Table 1 Example simple linear mixed models in 2-level and 3-level study designs Nature of design Random intercept effects only Random intercept effects and random slope effects Note: Clusters are indexed by i, Subjects are indexed by j, and Time points are indexed by t does not require multilevel analyses. For example, in a clustered study design, research questions where the dependent variable is at the highest level will not require multilevel analyses since the members of the highest level are uncorrelated. In this case, the variation amongst members of the upper level is the only variance component, and a fixed effects model analyzed by ordinary least squares (OLS) is appropriate [10]. In a repeated measures study design, if the dependent variable in the research question is at a single time point, it is not a multilevel question as there are no repeated measures. Also, if the dependent variable is the time to the occurrence of an event (survival data), the research question is no longer multilevel; unless there is additional hierarchical structure, as in 'frailty' models. In a 2 or more level hierarchical clustered study, any research question using as dependent variable any lower level variable will require a multilevel analysis. The next step is to consider using the multilevel diagram [8] as presented in Fig. 1. The multilevel diagram allows visualizing the levels of a study, the structure of the levels, and the variables collected at each level. Variables collected at higher levels than the dependent variable are usually called contextual variables. The diagram readily allows one to see if the dependent variable for a particular research question requires a multilevel analysis.
Another important consideration are 'aggregated' or 'collapsed' variables, which are variables derived by summarizing the values of observations from lower levels. For example, if years of education is available at the individual level for each adult in a household, the variable 'highest education level of the household' is an aggregated variable at the household level. If we have the sex and the grade-points for each student in multiple schools, the proportion of boys per school and the school-wide average grade-point are school-level aggregated variables.
Note that for a research question to be multilevel, the crucial decision is whether the dependent variable is at a lower level. One can have independent variables at a different (lower) level, but if the dependent variable is at the highest level, it is not a multilevel research question. For example, in a repeated measures design, the outcome at the end of treatment for a given person (e.g. treatment success) is measured only once, but may depend on values of a variable measured at different time points (e.g. hypertension at baseline and at times t1 and t2 prior to end of treatment).

How to report descriptive analyses
With a hierarchical study design, a correct multilevel descriptive analysis should include analyses of the outcomes of interest at all relevant levels and distribution of the variables in all levels. This step will also help the researcher uncover irregularities in the data, such as unusual patterns of missingness, lack of heteroscedasticity, or unusual shapes of distributions. It is also helpful in understanding which variables are correlated and how to possibly consider them in the modeling.
The choice of summary statistics to use, as with nonmultilevel descriptive statistical analysis, will depend on the type of variable. When presenting summary statistics (e.g. means for continuous variables, proportions for categorical variables) of variables collected at lower levels, measures of variability and confidence intervals must account for the variance inflation factors (VIFs).
When presenting plots, univariate and bivariate graphs should allow comparison of variables measured at the same level. With clustered data, plots of lower level variables should identify membership in upper level groups. With longitudinal data, plots of repeated measurements over time should identify points that come from the same subject (e.g. 'spaghetti plots') rather than summaries over time that obscure the fact that some of the same subjects are included across the summaries [11].

How to report modeling analyses
Descriptive bivariate analyses that assess significance of correlation and association measures should adjust for the correlation in the observations. Once the focus shifts to the dependent variable of interest, the correlation among the observations of the dependent variable of interest at each level must be studied and presented. Variance decomposition must be performed and the VPCs or ICCs should be reported. An initial 'null' multilevel model with no independent variables is strongly encouraged.
The modeling, variable selection, and arriving at a 'final' model, is a process that every investigator can follow according to their choice, and is therefore not addressed. Note that adding dummy (indicator) variables as fixed effects for members of a higher level is not exactly equivalent to adding random intercept effects for members of a higher level. While both approaches do have the effect of explaining some of the variability in the outcome, only the latter decomposes the residual variance into components.
For the 'final' model, in addition to reporting the results for the fixed effects, one must report either the variance components or the VPCs or ICCs. It may be of special interest to report these for the 'null' model (i.e. with no independent variables), as well as for the final model (and other 'intermediate' models), so that the reader may understand the impact of explanatory variables on the variance components. Note also that if random intercepts and random slopes are included in the models, the estimated correlation structure among the random effects should also be presented. Finally, measures of model fit, such as either the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC), or the area under the receiver operating characteristic (ROC) curve (AUC) for logistic regression models, may be useful for readers.
Example of reporting multilevel data structure and analyses: the Chilean dental study We use a 3-level study that measured presence of caries in temporary dentition in 2275 children from 40 preschools in 13 districts (comunas) of the Metropolitan Region (around the capital of Santiago) in Chile, to illustrate what and how to present results from a multilevel analysis. All the districts in the Metropolitan Region were classified according to the United Nations Development Program (UNDP) Human Development Index (HDI) [12], then stratified into 5 groups: Very High, High, Middle, Low, and Very Low. Estimation of the necessary number of children and pre-schools to include took into account the expected ICCs and VIFs based on the literature. Thirteen districts were randomly selected across the strata. Within a district, educational establishments (pre-schools) were categorized into private (paid), private (subsidized) or public, and approached for participation. All selected districts participated, but the private pre-schools of the highest HDI district refused to participate; thus that district only had public (municipal) pre-schools participating. All eligible children of a school were invited to participate, and refusal (by parents) rates were less than 1%. The study was approved by the Comité de Ética de Investigación en Seres Humanos (ethics committee) of the Facultad de Medicina of the Universidad de Chile. Table 2 displays the multilevel diagram for this study. The research question was: 'Which factors are related to the presence of caries in temporary dentition in children of different districts of the Metropolitan Region?' The prevalence of caries in temporary dentition in a group can be calculated from the presence of caries in temporary teeth at the individual-level. We note that our dependent variable is at a lower level, while the independent variables of interest are from various levels. Table 3 presents the results of three different randomintercept logistic regression models: the 'null' model, an 'intermediate' model, and a 'final' model, fitted using maximum likelihood. Usually only a final model is presented, but we illustrate how the other models can help in understanding changes in the VPC when one introduces independent variables from different levels in multilevel models. See model equations in the Additional file 1.
The effect estimates and 95% confidence intervals (CIs) do account for the correlation among the observations; at the bottom of the table of results, one presents the corresponding intraclass correlation coefficients and the model fit criteria.
We first note that in the intermediate model, which only includes district-level and school-level covariates, the district-level variables of HDI and rurality, and the type of school are statistically significantthe higher the human development index of the district, the lower the probability of caries among the children, while children in private (paid) pre-schools have lower probability of caries. In the final model, which now includes child-level covariates, the odds ratios (ORs) for school type and rural location are no longer significant. The sex and age of the child are significant, while family income and access to health care were not significantly associated with caries presence. Secondary school education of the main caretaker was associated with higher likelihood of caries. It could be that district-level factors like HDI account for the effect of child-level socioeconomic factors.
From the 'null' model, we note that the correlation of the presence of caries of children from the same district is not negligible (ICC = 0.0495), but also that this correlation is more than doubled (ICC = 0.1278) among children within the same school. When we consider district-level and school-level covariates, the ICC for district and for school within district are reduced. The ICC for district is not reduced further when we add child-level covariates in the 'final' model. However, the correlation among presence of caries among children within the same school is reduced when child-level covariates are included in the model. The final model, as expected, has a much better fit than the intermediate model (much lower AIC), since it incorporates child-level covariates, which explain well the child-level variable of presence of caries.

Discussion
The objective of this manuscript is to recommend how to report and present multilevel data and the results of linear mixed models. The need for such a checklist has been previously established by Casals et al. [6], who conducted a systematic review of the quality of the presentation of results and information from LMMs in the field of clinical medicine. Their extensive and systematic review of indexed medical journals included longitudinal studies, repeated measurements and multilevel design studies, from various medical disciplines. They found that "most of the useful information about generalized linear mixed models was not reported in most cases." [6] Less than 10% reported the variance estimates of random effects. Aspects that apply to all modeling, such as covariate selection, estimation method, and goodness of fit, were also not universally reported. They conclude that "it is important to consider the use of minimal rules as standardized guidelines when presenting generalized linear mixed model results in medical journals." [6] This manuscript is limited since it is not intended to be a tutorial on statistical methods for analyzing correlated data. Many such tutorials do exist. We do not review the complex statistical considerations behind all the aspects Table 4  that are important in LMMs. We provided a real-data example using a mixed effects logistic regression analysis of a 3-level study to illustrate how they such analyzes could be reported following our recommendations. Table 4 presents a checklist of items that we recommend for reporting multilevel data and modelling results, where items are either suggested (S), expected (E) or necessary (N). The checklist was developed by the authors based on their experience in conducting and presenting multilevel data analyses. We thus welcome comments from users of the proposed checklist and from journal editors. We welcome considering extending our recommended checklist to other multilevel models. Checklists such as the PRISMA [13], STROBE [14], CONSORT [15] and others have improved the quality of reporting of scientific medical research studies in abstracts and full manuscripts [16]. More recently, reporting guidelines for models have been proposed [17,18]. The proposed LEVEL checklist is modeled on STROBE guidelines, modified for multilevel studies.

Conclusions
A standardized checklist for the reporting of multilevel data and the presentation of linear mixed models will promote adequate reporting of correlated data analyses, and ensure that appropriate statistics are contained and explained thoroughly in manuscripts. The implementation of our checklist of items to report when presenting results of a multilevel analysis hopes to increase transparency, completeness, and the quality of reporting.