 Research article
 Open Access
 Published:
A graphical approach to assess the goodnessoffit of randomeffects linear models when the goal is to measure individual benefits of medical treatments in severely ill patients
BMC Medical Research Methodology volume 20, Article number: 193 (2020)
Abstract
Background
Twodimensional personalized medicine (2PM) models are tools for measuring individual benefits of medical treatments for chronic diseases which have potential applications in personalized medicine. These models assume normality for the distribution of random effects. It is necessary to examine the appropriateness of this assumption. Here, we propose a graphical approach to assessing the goodnessoffit of 2PM models with continuous responses.
Methods
We propose benefit quantilequantile (BQQ) plots which compare the empirical quantiles of individual benefits from a patient sample predicted through an empirical Bayes (EB) approach versus the quantiles of the theoretical distribution of individual benefits derived from the assumption of normality for the random effects. We examine the performance of the approach by conducting a simulation study that compared 2PM models with nonnormal distributions for the random effects versus models with comparable normal distributions. Cramervon Mises discrepancies were used to quantify the performance of the approach. The approach was illustrated with data from a clinical trial of imipramine for patients with depression.
Results
Simulations showed that BQQ plots were able to capture deviations from the normality assumption for the random effects and did not show any asymmetric deviations from the y = x line when the random effects were normally distributed. For the depression data, the points of the BQQ plot were scattered around closely to the y = x line, without presenting any asymmetric deviations. This implied the adequacy of the normality assumption for the random effects and the goodnessoffit of the 2PM model for the imipramine data.
Conclusion
BQQ plots are sensitive to violations of the normality assumption for the random effects, suggesting that the approach is a useful tool for examining the goodnessoffit of randomeffects linear models when the goal is to measure individual treatment benefits.
Background
Twodimensional personalized medicine (2PM) models are tools for measuring the severity of a patient’s chronic disease and the individual benefits of medical or behavioral treatments [1, 2]. The patient’s disease severity at a specific point in time is defined as the probability of missing the therapeutic target, and the individual’s benefit is therefore measured as the reduction in disease severity produced by the treatment. When the disease severity before treatment is close to 1, the patient is regarded as severely ill. The severity and individual benefits are functions of known and unknown patient’s characteristics. In practice, 2PM models are built using linear regression models with random effects that are assumed to be normally distributed, and severities and benefits are calculated with both the fixed effects and the random effects of the model [1, 2]. In addition to being useful for measuring the individual benefits achieved by the patients of a clinical trial, the fitted 2PM model can be used to measure individual benefits in potentially new patients [1, 2].
Given the potential applications of 2PM models in personalized medicine, it is necessary to develop methods for examining their goodnessoffit with a focus on their ability to measure individual benefits. In this article, we propose a graphical approach to assessing the goodnessoffit of 2PM models for continuous responses of severely ill patients. The approach compares the quantiles of the empirical Bayes (EB) predictors of individual treatment benefits of the patient sample against the theoretical quantiles of the distribution of individual benefits that are derived from the normality assumption for the random effects. We conducted a Monte Carlo simulation study that showed that the approach is sensitive to deviations from the normality assumption for the random effects. Specifically, the graphical approach captures the discrepancy between multivariate nonnormal distributions for the random effects and normal distributions with the same mean and variancecovariance matrix. Since the main purpose of a 2PM model is to measure the individual benefits of a medical or behavioral treatment in the patients of a clinical study or in potentially new patients [1, 2], the shape of our proposed goodnessoffit plot will depend on the clinician’s therapeutic target. Therefore, conclusions on the degree of adequacy of the model will generally depend on the target.
Randomeffects linear models (RELMs) are efficient tools for building 2PM models of continuous responses. They are commonly used to understand individuals’ time trajectories of treatment effects [1,2,3,4,5,6,7,8,9,10,11]. In RELMs, the distributional assumption of the unobserved random effects is important for estimation and inference since the marginal likelihood function, obtained by integrating out the random effects, depends on the assumed distribution for the random effects. An EB approach is typically used to predict the random effects [1,2,3, 12,13,14]. In RELMs, the EB predictors of the random effects are estimates of the best linear unbiased predictors (BLUPs), which have optimality properties that do not require the normality assumption for the random effects [13]. Thus, a BLUP is robust to violations of the normality assumption and the EB predictor inherit some of its robustness [15,16,17]. The prediction accuracy of the BLUPs for random effects is not substantially affected by distribution misspecifications, as shown by both theoretical and numerical studies [16]. Traditionally and for ease of computation, in RELMs researchers assume that the unobserved random effects follow normal distributions. Violations of this normality assumption are possible. For instance, omission of patient level categorical covariates may produce multimodal distributions for the random intercept [18]. Although violations of the normality assumption have smalltomild effects on the maximum likelihood estimates (MLE) of the fixed effects, they may affect the prediction of the random effects by increasing the bias of the variance components estimates, especially in generalized linear mixed effects models [12, 18,19,20,21,22]. Therefore, assessing the goodnessoffit of 2PM models with respect to the normality assumption for the random effects is crucial.
Several graphical methods have been proposed for examining the goodnessoffit of randomeffects models for longitudinal data. The most well known graphical approach is based on conditional residuals which are computed with EB predictors of the random effects [12, 23,24,25,26]. However, although conditional residuals may detect deviations from the assumption of linearity, are useful for detecting outliers and allow examining the normality assumption for the error term of the model, they do not allow examining the normality of the random effects. Normal quantilequantile (QQ) plots are also commonly used in model development to examine the assumption of normality for the random effects. In normal QQ plots, the sample quantiles of the individual EB predictors of a random effect are plotted against the quantiles of a normal distribution whose mean and variance are the sample mean and variance of the EB predictors [24, 26, 27]. A limitation of normal QQ plots is that the data analyst must conduct a separate analysis for each random effect in the model. Moreover, there is some evidence that the EB predictors of a random effect tend to have a unimodal distribution, even in situations in which the true distribution of the random effect has two or more modes [15, 18, 19], which may artificially straighten the cloud of points of a normal QQ plot. Verbeke and Molenberghs [28] proposed a graphical diagnostic tool using gradient functions that checks the appropriateness of the random effects distribution assumption. Similar to normal QQ plots, a gradient function needs to be plotted for each random effect in the model. Pan and Lin [29] proposed graphical and numerical techniques based on cumulative sums of residuals for checking the link function and functional forms of covariates in generalized linear mixed models; their approach, however, does not address the assumption of normality for the random effects. Grady and Helms [30] assessed the fit of the assumed covariance structure by plotting lagged sample covariances or correlations. Diaz et al. [31] assessed the goodnessoffit of a random intercept model by plotting randomeffectadjusted observations based on EB predictors of the random intercept versus expected observations. Although this approach is useful for examining the linearity assumption of RELMs and detecting outliers, it does not allow assessing the normality assumption for the random effects. Others have proposed formal statistical tests [32,33,34,35]. To check the normality assumption for the random effects, Efendi et al. [32] use a bootstrap test based on gradient functions. Drikvandi et al. [33] propose a diagnostic test based on Cramervon Mises discrepancies. Alonso et al. [34] propose tests that use the eigenvalues of the variancecovariance matrices of fixed effects estimates obtained from robust inference methods. Similarly, Abad et al. [35] use information matrices to propose diagnostic tests for generalized linear mixed effects models.
This paper is organized as follows. First, we present a review of 2PM models for continuous responses and the calculation of individual treatment benefits for severely ill patients. Then we present a motivation for the graphical approach and describe it in detail. Next, the approach is illustrated using data from a clinical trial of the antidepressant imipramine. Then we describe the simulation scenarios used to evaluate the performance of the proposed graphical approach. We also describe how Cramervon Mises discrepancies are used to quantify deviations from the normality assumption for the random effects. The paper ends with a discussion and conclusions.
Methods
Individual severity and treatment benefits using time dependent 2PM models
Time dependent 2PM models allow understanding the evolution of individual treatment benefits over time [1]. Let Y be a continuous measure reflecting the patient’s disease. Before a treatment Q is initiated, the responses for patient ω are measured k_{0,ω} times and modeled by
After the treatment is initiated, the responses are measured k_{1,ω} times and modeled by
where Λ_{ω} = α_{ω} + λ^{T}X_{ω} and \( {\beta}_{Q,\omega, j}={\theta}_{1,\omega }{t}_{\omega, j}+{\theta}_{2,\omega }{t}_{\omega, j}^2+\dots +{\theta}_{d,\omega }{t}_{\omega, j}^d \). Here, X_{ω} is a vector of patient (subject) characteristics that do not change during the trial. For patient ω, Λ_{ω} is a constant number that reflects the patient’s disease state before treatment initiation and β_{Q,ω,j} is the individual’s (timedependent) treatment effect after t_{ω,j} time units of treatment. We write β_{Q,ω}(t) in place of β_{Q,ω,j} to express the treatment effect at a generic point in time t. We view Λ_{ω} and β_{Q,ω}(t) as individual realizations of populationlevel random variables Λ^{∗} and \( {\beta}_Q^{\ast }(t) \), respectively [1]. Also, ε_{0,ω,j} and \( {\varepsilon}_{\omega, j}^{\prime } \) represent measurement errors or withinpatient variability due to the patient’s internal or external factors, which we assume to be \( N\left(0,{\sigma}_{\varepsilon}^2\right) \) and \( N\left(0,{\sigma_{\varepsilon}^{\prime}}^2\right) \), respectively.
Here, we assume that the therapeutic target is to achieve Y ≤ y, where y is a value prespecified by the clinician. The patient’s basal disease severity is defined as the probability that the patient does not satisfy the therapeutic target before treatment initiation. Thus, a patient ω has basal severity [1, 2]
where Φ is the cumulative distribution function of the standard normal distribution. The patient’s disease severity after a treatment duration t is
That is, s_{2,ω}(t) is the probability that the target has not been achieved at time t. The patient’s individual benefit from the medical treatment at the point in time t is the reduction in disease severity
By definition, a patient is severely ill if the patients’ basal severity is approximately 1. Here, we assume that all patients are severely ill, that is, s_{0,ω} ≈ 1 for all ω. It is shown in Diaz [1] that under the reasonable assumption that \( {\sigma}_{\varepsilon}^{\prime}\ge {\sigma}_{\varepsilon } \), if the patient is severely ill, the patient’s benefit can be computed as
In the following, we assume \( {\sigma}_{\varepsilon}^{\prime }={\sigma}_{\varepsilon } \) which is usually clinically reasonable. Here, α_{ω} and θ_{1,ω}, …, θ_{d,ω} are characteristic constants of patient ω that are viewed as realizations of random coefficients α^{∗} and \( {\theta}_1^{\ast },\dots, {\theta}_d^{\ast } \) that do not necessarily have mean 0. In the terminology of mixed effects models, E(α^{∗}), λ and \( E\left({\theta}_1^{\ast}\right),\dots, E\left({\theta}_d^{\ast}\right) \) are the fixed effects, and α^{∗} − E(α^{∗}) and \( {\theta}_i^{\ast }E\left({\theta}_i^{\ast}\right) \), i = 1, …, d, are the random effects which are usually assumed to be jointly normally distributed. Here, we propose a graphical method to examine the assumption of normality.
Quantiles of individual benefits under the normality assumption
Under the assumption of normality for the random effects, since the patients are severely ill, the cumulative distribution function of individual benefits for patients with covariate value X = x at time t is [1]
where \( \mu =\mu \left(\boldsymbol{x},t\right)=\frac{yE\left({\varLambda}^{\ast }+{\beta}_Q^{\ast }(t)\right)}{\sigma_{\varepsilon}^{\prime }} \) and \( {\gamma}^2={\gamma}^2(t)=\frac{\mathrm{Var}\left({\varLambda}^{\ast }+{\beta}_Q^{\ast }(t)\right)}{{\sigma_{\varepsilon}^{\prime}}^2} \). Further, the pth quantile of the probability distribution function of individual treatment benefits is [1]
The quantities in (2) and (3) are functions of treatment duration t, since μ and γ are. They also vary with the fixed effects and variance components (i.e., the variances and covariances of the random effects and the error variance).
A motivation for the proposed graphical approach
Here, we estimate (predict) the individual treatment benefits using the EB approach described in Diaz [1, 2]. The EB predictors of individual treatment benefits are obtained by replacing the fixed effects, error variance and individual random effects in Eq. (1) with their estimates or predictors. The fixed effects and variance components are usually estimated through maximum or restricted maximum likelihood [4,5,6, 12, 13, 36]. We predict the random effects following an EB approach [1, 2, 9, 12, 13, 19, 36,37,38]. Importantly, the EB predictors of random effects are estimates of the best linear unbiased predictors (BLUPs) and the BLUPs do not assume normality for the random effects [13]. Moreover, the EB predictors of random effects are relatively robust to violations of the normality assumption [13, 15, 16, 38]. Because of this, one can view the sample quantiles of the EBpredicted individual benefits as robust estimates of the quantiles of the probability distribution of individual benefits. Alternatively, we can directly estimate the quantiles by replacing the fixed effects and variance components in Eq. (3) with their corresponding estimates. Therefore, if the normality assumption is violated, we expect the quantiles estimated with Eq. (3) to be substantially different from the sample quantiles based on the BLUPs because Eq. (3) is derived from the assumption of normality. Thus, we propose to compare the sample quantiles based on the BLUPs with the quantiles calculated with Eq. (3) in order to evaluate the assumption of normality for the random effects.
Goodnessoffit plot
Suppose the sample of patients can be divided into G subgroups. This is possible, for instance, when the patient characteristics are categorical or when a characteristic is continuous and it is split into categories based on published cutoff values or percentiles. Therefore, we assume that X_{ω} includes only binary (dummy) covariates and that X_{ω} has G distinct possible values x_{1}, …, x_{G}. Let N_{g} be the number of patients in the subpopulation of patients for whom X_{ω} = x_{g}, and let \( N=\sum \limits_{g=1}^G{N}_g \) be the total number of patients. For a particular time t, let \( {\hat{b}}_{g,t,1},\dots, {\hat{b}}_{g,t,{N}_g} \) be the EBpredicted individual benefits for the N_{g} patients in group g, and \( {\hat{b}}_{g,t,(1)}<{\hat{b}}_{g,t,(2)}<\dots <{\hat{b}}_{g,t,\left({N}_g\right)} \) be the corresponding order statistics. A benefit quantilequantile (BQQ) plot consists of plotting in an xy plane the N points
where \( \hat{B} \) is obtained by replacing fixed effects and variance components in Eq. (3) with their maximum likelihood or restricted maximum likelihood estimates (RMLEs). Thus, a BQQ plot compares the sample quantiles of individual benefits predicted with the EB approach versus estimates of the theoretical quantiles derived from the normality assumption for the random effects. In practice, we use the maximum point in time available in the dataset as a value for t.
If the points on the BQQ plot do not deviate asymmetrically much about the y = x line, then we conclude that the normality assumption for the random effects of the 2PM model is appropriate and, therefore, that we can have reasonable confidence in the EB predictors of the individual benefits achieved by the patient sample. Note that, by Eqs. (1) and (3), the shape of the BQQ plot depends on the prespecified therapeutic target and, therefore, conclusions on the adequacy of the 2PM model are applicable only to the specific target used.
Application to depression study
As an illustration, we use clinical trial data from 66 patients under imipramine treatment with two types of depression diagnosis [39]. The diagnoses were endogenous (N_{1} = 37) or nonendogenous (N_{2} = 29). The data is available in Hedeker and Gibbons [6] and is also analyzed by Diaz [1]. The response variable, the Hamilton Rating Scale (HRS) for depression, was recorded at the beginning and end of the week before imipramine treatment initiation and at the end of each of the next 4 weeks during treatment. Diaz [1] fitted a randomeffects linear model of the HRS scores in order to predict individual treatment benefits but did not provide evidence for the model’s goodnessoffit which we examine here. As covariates, the model included diagnosis (1 = endogenous, 0 = nonendogenous) as well as time and timesquare, where time is the number of weeks on treatment. Gender was not significant after adjusting for diagnosis and was therefore not included as a covariate. The intercept and the linear and quadratic terms of time had random effects in addition to the fixed effects. We assumed an unstructured covariance matrix for the random effects and homoscedastic independent errors. The SAS procedure MIXED, which assumes normally distributed random effects, was used to obtain the MLEs of the fixed effects and EB predictors for the random effects for all patients (SAS Institute Inc. Cary, NC). All observed baseline HRS scores were ≥11. For the computation of individual imipramine benefits, we assumed that the therapeutic target was to achieve an HRS score ≤7 (i.e., y = 7). Using this target and the model parameter estimates, Diaz [1] calculated that the EB predictors of all patients’ basal severities were approximately 1 and, therefore, concluded that the patients can be assumed to be severely ill.
Results for the application study
Parameter estimates are shown in Table 1 in Diaz [1]. Figure S1 in the Supplementary Information indicates that for the depression data, the model satisfies reasonably well the normality assumption for the residuals.
Figure S2 in the Supplementary Information shows histograms and kernel densities of EB predictors of the random effects. The shapes of the histograms seem to suggest approximate normality for the distribution of the random effects. However, Verbeke and Lesaffre [19] and Mcculloch and Neuhaus [15] have found that the shape of the histograms for EB predictors may be misleading and may not reflect the true distribution of the random effects.
We used Eq. (3) to estimate selected p × 100% percentiles of the individual benefits of imipramine as functions of treatment duration for nonendogenous (Fig. 1a) and endogenous (Fig. 1c) patients. For comparison purposes, we also calculated the corresponding sample percentiles of the EBpredicted individual benefits for nonendogenous (Fig. 1b) and endogenous (Fig. 1d) patients. In general, the sample percentiles and the estimated theoretical percentiles seem to convey similar information. For instance, for fixed values of p, both estimation methods show that the benefit percentiles for nonendogenous patients tended to be higher than the corresponding percentiles for endogenous patients, which reflects the fact that imipramine was more beneficial in nonendogenous than endogenous patients [1].
The BQQ plot in Fig. 2, however, displays a better comparison of the estimated theoretical percentiles and sample percentiles at week 4. The points in the plot are distributed closely about the y = x line, without exhibiting any considerable asymmetric deviations. Figure 2 also exhibits a relatively high concentration of patients at the bottom left corner. The EB quantiles coincided with the theoretical quantiles in that about 59% of the patients achieved relatively small individual imipramine benefits ≤0.13 in a probability scale. Thus, the sample quantiles of the patients’ individual benefits matched closely the theoretical quantiles derived from the normality assumption. This suggests the adequacy of this assumption and the goodnessoffit of the 2PM model for the imipramine data, provided the purpose of the model is to measure individual benefits under the therapeutic target HRS ≤7.
Figure 3 compares the BQQ plot in Fig. 2 with eight BQQ plots that were consecutively simulated with the fitted model of HRS scores, which had normal random effects. The plot in the middle of Fig. 3 is the BQQ plot computed with the real data and the other plots are the simulated ones. The parameters in Table 1 of [1] were used to simulate the patients of the simulated BQQ plots and, to ensure comparability with the real data, 37 simulated endogenous and 29 simulated nonendogenous patients were used in each simulated plot. Note that the plot made with the real data cannot be singled out easily as most different from the other plots and does not seem to have a distinguishing feature compared to the other plots. This reinforces our conclusion that the assumption of normality for the random effects of the 2PM model is reasonable [40, 41].
We also applied the diagnostic tests proposed by Alonso et al. [34] to this dataset. The null hypothesis is that the normality assumption for the random effects is reasonable. The two determinant tests and the determinanttrace test yielded the test statistics δ_{d1} = 2.9, δ_{d2} = 1.19 and δ_{d3} = 1.28, with corresponding pvalues of 0.085, 0.276 and 0.258. All three pvalues were larger than the chosen 0.05 significant level. This is consistent with the conclusions from our proposed graphical approach. We include here these tests only for comparison purposes and do not consider them as a final confirmatory tool (see the Discussion Section).
Simulation study
We conducted a simulation study to assess the performance of BQQ plots under violations of the normality assumption for the random effects. Motivated by the application study in Diaz [1], data from the following two models were simulated:
Model 1: (Random intercept and random slope for time)
such that Λ_{ω} = ψ_{0} + τ_{0,ω} + ψ_{1}x_{ω} and β_{Q,ω}(t) = (ψ_{2} + τ_{2,ω})t + ψ_{3}t^{2}, N is the number of patients and n is the number of observations per patient. Here, ψ = (ψ_{0}, ψ_{1}, ψ_{2}, ψ_{3})^{T} are the fixed effects and τ_{ω}= (τ_{0,ω}, τ_{2,ω})^{T} are the random effects with mean 0. Moreover, x_{ω}~Bernoulli(0.6) represents a patient’s timeindependent characteristic (for instance, gender, smoking, etc.) and the ε_{ω,j} ’s are mutually independent with \( {\varepsilon}_{\omega, j}\sim N\left(0,{\sigma}_{\varepsilon}^2=10\right) \).
Model 2: (Random intercept and random slopes for time and time square)
such that Λ_{ω} = ψ_{0} + τ_{0,ω} + ψ_{1}x_{ω} and β_{Q,ω}(t) = (ψ_{2} + τ_{2,ω})t + (ψ_{3} + τ_{3,ω})t^{2}. In this case, τ_{ω}= (τ_{0,ω}, τ_{2,ω}, τ_{3,ω})^{T} are the random effects with mean 0. We assumed an unstructured variancecovariance matrix for the random effects for both models [5, 6] and no missing responses.
Varying values for N were used and n = 4 or 6. For either model, we simulated 2 baseline measurements and 2 or 4 measurements under medical treatment. Thus, for all models, k_{0,ω} = 2, and t_{ω,1} = t_{ω,2} = 0. When n = 4, k_{1,ω} = 2, t_{ω,3} = 1 and t_{ω,4} = 4; and when n = 6, k_{1,ω} = 4, t_{ω,3} = 1, t_{ω,4} = 2, t_{ω,5} = 3 and t_{ω,6} = 4. For all models, \( {Y}_{0,\omega, j}={Y}_{\omega, j}^{\prime } \) for j = 1, 2, and \( {Y}_{Q,\omega, j}={Y}_{\omega, j+2}^{\prime } \) for j = 1, …, k_{1,ω}.
The therapeutic target was to achieve Y ≤ y with y = 7. The MLEs of ψ and \( {\sigma}_{\varepsilon}^2 \) are denoted by \( \hat{\boldsymbol{\psi}}={\left({\hat{\psi}}_0,{\hat{\psi}}_1,{\hat{\psi}}_2,{\hat{\psi}}_3\right)}^T \) and \( {\hat{\sigma}}_{\varepsilon}^2 \); and the EB predictor of τ_{ω} by \( {\hat{\boldsymbol{\tau}}}_{\omega }={\left({\hat{\tau}}_{0,\omega },{\hat{\tau}}_{2,\omega}\right)}^T \) for Model 1 or \( {\hat{\boldsymbol{\tau}}}_{\omega }={\left({\hat{\tau}}_{0,\omega },{\hat{\tau}}_{2,\omega },{\hat{\tau}}_{3,\omega}\right)}^T \) for Model 2. Here, we investigate BQQ plots computed at the last time point, namely t = 4. We used Eq. (1) to predict the individual benefits after replacing σ_{ε}, Λ_{ω} and β_{Q,ω}(t) with their estimates \( {\hat{\sigma}}_{\varepsilon } \), \( {\hat{\Lambda}}_{\omega }={\hat{\psi}}_0+{\hat{\tau}}_{0,\omega }+{\hat{\psi}}_1{x}_{\omega } \), and \( {\hat{\beta}}_{Q,\omega }(t)=\left({\hat{\psi}}_2+{\hat{\tau}}_{2,\omega}\right)t+{\hat{\psi}}_3{t}^2 \) for Model 1 or \( {\hat{\beta}}_{Q,\omega }(t)=\left({\hat{\psi}}_2+{\hat{\tau}}_{2,\omega}\right)t+\left({\hat{\psi}}_3+{\hat{\tau}}_{3,\omega}\right){t}^2 \) for Model 2.
Let Σ_{i,j} be the (i, j)th entry of the variancecovariance matrix of τ_{ω} and \( {\hat{\Sigma}}_{i,j} \) be its maximum likelihood estimator. Thus, Σ_{i,j} is of dimension 2 × 2 for Model 1, and 3 × 3 for Model 2. The μ in Eq. (3) is estimated with \( \hat{\mu}=\left(y\left({\hat{\psi}}_0+{\hat{\psi}}_1{x}_{\omega }+{\hat{\psi}}_2t+{\hat{\psi}}_3{t}^2\right)\right)/{\hat{\sigma}}_{\varepsilon } \) for both models, whereas γ is estimated with \( {\hat{\gamma}}^2=\left({\hat{\Sigma}}_{1,1}+{t}^2{\hat{\Sigma}}_{2,2}+2t{\hat{\Sigma}}_{1,2}\right)/{\hat{\sigma}}_{\varepsilon}^2 \) for Model 1, and \( {\hat{\gamma}}^2=\left({\hat{\Sigma}}_{1,1}+{t}^2{\hat{\Sigma}}_{2,2}+{t}^4{\hat{\Sigma}}_{3,3}+2t{\hat{\Sigma}}_{1,2}+2{t}^2{\hat{\Sigma}}_{1,3}+2{t}^3{\hat{\Sigma}}_{2,3}\right)/{\hat{\sigma}}_{\varepsilon}^2 \) for Model 2.
Table 1 shows the “true” fixed effects employed in simulations. These were chosen so that most simulated patients were severely ill under all examined nonnormal and normal distributions for the random effects; specifically, P(s_{0,ω} > 0.9) ≥ 0.95.
Simulation of random effects
We implemented four simulation scenarios to represent situations in which the normality assumption for the random effects is violated (Table 1). For comparison purposes, in each scenario, τ_{ω} was simulated from both a nonnormal distribution and a reference normal distribution with the same mean and variancecovariance matrix.
Scenario 1: Model 1 with a symmetric mixture of two bivariate normal distributions
Here, we explore the effect on the BQQ plot of the distance between the means of the two components of a mixture of normal distributions for N ∈ {30, 50, 100, 150, 200, 300, 500}. The true τ_{ω} was distributed as
where m_{1} = (0, −1)^{T}, m_{2} = (0, 1)^{T} and \( V=\left[\begin{array}{cc}1& 0.9\\ {}0.9& 1\end{array}\right] \). The distance between the mean vectors is \( w\bullet \sqrt{{\left({\boldsymbol{m}}_1{\boldsymbol{m}}_2\right)}^T\left({\boldsymbol{m}}_1{\boldsymbol{m}}_2\right)}. \) We examined w ∈ {1, 2, 3, 4, 5}. The reference normal distribution with the same mean and variancecovariance matrix was N(m, D^{∗}), where m = (0, 0)^{T} and \( {D}^{\ast }=\frac{1}{2}{\boldsymbol{m}}_1^{\ast }{{\boldsymbol{m}}_1^{\ast}}^T+\frac{1}{2}{\boldsymbol{m}}_2^{\ast }{{\boldsymbol{m}}_2^{\ast}}^T+V \). Here, a greater distance between the means of the two component distributions represents a greater deviation from normality and, therefore, we expect the BQQ plot to show greater departures from the diagonal line (Figs. 4, 8; Table S1).
Scenario 2: Model 1 with an asymmetric mixture of two bivariate normal distributions for the random effects
Here, we explore how the variance of the components of a mixture of normal distributions affects the BQQ plot, for sample sizes N ∈ {20, 60, 100, 160, 200, 300, 500}. The true random effects vector τ_{ω} was distributed as
where m_{1} = (0, −1)^{T}, m_{2} = (0, 3)^{T} and \( V=\left[\begin{array}{cc}{\sigma}_1^2& 0.9\\ {}0.9& {\sigma}_2^2\end{array}\right] \). We examined \( {\sigma}_1^2={\sigma}_2^2\in \left\{1,2,3,4,5\right\} \). In this case, the overall mean and variance are m = (0, 0)^{T} and \( {D}^{\ast }=\frac{3}{4}{\boldsymbol{m}}_1{\boldsymbol{m}}_1^T+\frac{1}{4}{\boldsymbol{m}}_2{\boldsymbol{m}}_2^T+V \). Thus, for comparison purposes, τ_{ω} was also simulated from the reference bivariate N(m, D^{∗}). Here, since the mean vectors are fixed, a greater variance for the components of the mixture implies a “less bimodal” distribution. Therefore, we expect BQQ plots for the nonnormal cases to be more like their corresponding reference normal cases when the variability of the components is larger (Figs. 5, 9; Table S2).
Scenario 3: Model 2 with a trivariate t distribution for the random effects
Here, the true random effects were simulated from a trivariate t distribution with degrees of freedom v ∈ {3, 5, 7, 9, 11, 13}, location parameter m = (0, 0, 0)^{T}, and shape parameter Γ given in Table 1. The purpose was to study how BQQ plots are affected by the heaviness of the tails of the t distribution, using N ∈ {30, 50, 100, 150, 200, 300, 500}. The reference normal distribution with the same mean and variancecovariance matrix was \( N\left(\boldsymbol{m},{D}^{\ast }=\left(\frac{v}{v2}\right)\Gamma \right) \) [42]. In this scenario, smaller degrees of freedom are associated with heavier tails for the distribution of random effects. Thus, we expect BQQ plots for nonnormal cases to resemble more the reference normal plots when v is larger (Figs. 6, 10; Table S3).
Scenario 4: Model 2 with a symmetric mixture of two trivariate normal distributions
This scenario is analogous to Scenario 1, except that we used trivariate normal distributions for the components of the mixture. The goal was also to examine the effect of the distance between the means of the two normal components on BQQ plots. Since a greater distance represents a greater deviation from normality, we expect the BQQ plots to show greater departures from the diagonal line (Figs. 7, 11; Table S4).
Cramervon Mises discrepancy measure
We used Cramervon Mises discrepancy measure (CVM) to quantify the deviation of the BQQ plot from the y = x line under violations of the normality assumption [43,44,45]. Let \( {F}_{N_g}(z)={F}_{N_g}\left(z;t\right) \) be the empirical distribution function of \( {\hat{b}}_{g,t,1},\dots, {\hat{b}}_{g,t,{N}_g} \) and denote \( {U}_{g,t,1}=F\left({\hat{b}}_{g,t,1};{\boldsymbol{x}}_g,t\right),\dots, {U}_{g,t,{N}_g}=F\left({\hat{b}}_{g,t,{N}_g};{\boldsymbol{x}}_g,t\right) \). The CVM discrepancy between \( {F}_{N_g}\left(z;t\right) \) and F(z; x_{g}, t) was computed as [44].
The overall discrepancy was computed as the weighted average
where t is the maximum point in time of the last patients’ visits (t = 4 in the simulations). Larger values of \( \overline{\Omega} \) reflect more severe violations of the normality assumption for the random effects.
We simulated 500 datasets for each combination of values of N, n and randomeffects distribution parameters. For illustration purposes, selected BQQ plots are presented for N = 100 and n = 6 (Figs. 4, 5, 6, 7). These plots correspond to the datasets producing the \( \overline{\Omega} \) closest to \( \overline{\overline{\Omega}} \), where \( \overline{\overline{\Omega}} \) is the average of the 500 values of \( \overline{\Omega} \).
To examine the sensitivity of BQQ plots to detect deviations from normality, each simulated nonnormal case was compared with its corresponding reference normal distribution by using the ratio \( R=\frac{{\overline{\overline{\Omega}}}_{\mathrm{non}\mathrm{normal}}}{{\overline{\overline{\Omega}}}_{\mathrm{normal}}} \) (Figs. 8, 9, 10, 11). On average, we expect the \( \overline{\Omega} \) obtained from a nonnormal case to be larger than that of its reference normal distribution and, therefore, R > 1. This is because Ω_{g,t} measures the discrepancy between the empirical distribution of the sample individual benefits and the theoretical distribution obtained under the normality assumption for the random effects. We expect larger values of R to be associated with greater deviations from normality. The SAS procedures MIXED and IML were used to implement the simulations (SAS Institute Inc. Cary, NC; see additional file SAS CODE.dat).
Results of the simulation study
Scenario 1: symmetric mixtures of two bivariate normal distributions
As expected, larger distances between the two components of the mixture distribution determined more apparent asymmetric departures of the points on the BQQ plot from the y = x line (Fig. 4). By comparison, the BQQ plots for data simulated from the corresponding reference normal distributions tended not to show asymmetric deviations from the diagonal line. Figure 8 shows that the R ratios comparing CVM discrepancies of nonormal versus comparable normal distributions were always >1 and increased with the distance between the components of the mixture. In general, R increased with both the number of patients N and the number of repeated measures n, suggesting that the sample size contributes positively to the sensitivity of BQQ plots. Table S1 in the Supplementary Information shows the average CVM discrepancies \( \overline{\overline{\Omega}} \) for all instances of Scenario 1.
Scenario 2: asymmetric mixtures of two bivariate normal distributions
For the investigated mixtures of normal distributions, quantiles of EB benefits from patient samples tended to be larger than the corresponding theoretical quantiles that assume normality for the random effects (Fig. 5). Moreover, this pattern was more apparent with smaller variances for the components of the mixture. The pattern was not observed in the BQQ plots corresponding to the reference normal distributions. Figure 9 shows that the ratios comparing CVM discrepancies of nonnormal to reference normal distributions decreased with the variance of the mixture components, suggesting that BQQ plots are sensitive to deviations from normality. The ratios also increased with sample size N and the number of repeated measures n, suggesting that larger sample sizes increase the likelihood that BQQ plots capture normality violations. Table S2 in the Supplementary Information shows the average CVM discrepancies for the nonnormal and normal cases.
Scenario 3: trivariate t distribution
As the degrees of freedom increased, the BQQ plots for data simulated with t distributions became more similar to the BQQ plots for data simulated with comparable normal distributions (Fig. 6). The theoretical quantiles of individual benefits under the normality assumption tended to be larger than the quantiles for EB sample benefits when the tails of the t distribution became heavier. Figure 10 shows that the ratios R comparing CVM discrepancies under t distributions versus their reference normal distributions increased as the degrees of freedom decreased, suggesting that BQQ plots can reliably capture the heaviness of the tails of the t distribution. The ratios tended to increase as both N and n increased, implying that the larger the sample size is the more efficient the proposed graphical approach is for detecting tail heaviness. Table S3 in the Supplementary Information shows average CVM discrepancies \( \overline{\overline{\Omega}} \) for the nonnormal and normal cases.
Scenario 4: symmetric mixture of two trivariate normal distributions
Analogous to scenario 1, marked departures in the appearance of BQQ plots from what is expected under comparable normal distributions are observed when the random effects are distributed as a mixture of normal distributions (Fig. 7). Greater distances between the two mean vectors of the mixture components tended to be associated with larger asymmetric deviations from the y = x line. This trend can also be inferred from Fig. 11, which shows that, compared with the reference normal distribution, CVM discrepancies under a mixture of normal distributions increased as the distance between the mixture components increased. Table S4 in Supplementary Information shows average CVM discrepancies for the nonnormal and normal cases.
Discussion
This article proposes a graphical approach to examining the normality assumption of the random effects of 2PM models for severely ill patients. These models are based on RELMs and allow measuring individual benefits of medical or behavioral treatments [1, 2]. It is a common practice to explore the normality assumption for the random effects of RELMs by plotting separate classic normal QQ plots that examine the normality of the EB predictors of the random effects, for each random coefficient in the model [23, 24, 27, 46]. In the context of 2PM models, however, using BQQ plots has two major advantages over normal QQ plots:

(1)
Whereas a goodnessoffit analysis using normal QQ plots requires as many QQ plots as random coefficients are there in the model, only one BQQ plot is needed to examine the adequacy of the 2PM model. Thus, BQQ plots reduce the number of goodnessoffit analyses if the model has two or more random coefficients.

(2)
In contrast with normal QQ plots, BQQ plots examine directly the predictors of the individual benefits achieved by the patient sample. Therefore, BQQ plots contribute to assess whether the 2PM model is appropriate for predicting individual benefits, which is the model’s main purpose. In this regard, note that BQQ plots may not be useful if the goal of the RELM is not to measure individual benefits of medical treatments. In fact, as described above, the shape of a BQQ plot depends on the therapeutic target, which is prespecified in advance by the clinician. Also note that if the 2PM model will be used for measuring individual benefits in new patients, the EB predictors of individual benefits need to be additionally evaluated using the simulation approach described by Diaz [1].
An essential difference between normal QQ plots and BQQ plots is that whereas normal QQ plots represent visually the deviation of an empirical distribution from an estimated normal distribution, BQQ plots represent the deviation of an empirical distribution from an estimate of a nonnormal distribution. This nonnormal distribution is the one given by Eq. (2). This feature of BQQ plots is the reason we chose the CVM discrepancy for evaluating the performance of BQQ plots over the more common ShapiroWilk test statistic [43, 44, 47]. In fact, although the ShapiroWilk statistic has been shown to be more sensitive to deviations from linearity in normal QQ plots than the CVM and other discrepancy measures [48], it measures deviations from linearity by explicitly using the fact that the theoretical quantiles in the x axis of normal QQ plots are those of a normal distribution [41]. Therefore, ShapiroWilk statistic cannot be used to measure departures from linearity in BQQ plots. In contrast, the CVM discrepancy is a direct measure of the discrepancy between the two distributions that are being compared in BQQ plots: the empirical cumulative distribution function of the EB predictors of individual benefits of the patient sample versus an estimate of the nonnormal cumulative distribution function of individual benefits. There are other measures that can potentially be used to assess discrepancies between these two distributions; for instance, the KolmogorovSmirnov distance and the AndersonDarling test statistic [42, 45, 48, 49]. However, by using CVM discrepancies, we were able to show in the current study that BQQ plots are sensitive to deviations from the normality assumption of the random effects (Figs. 4, 5, 6, 7, 8, 9, 10, 11).
In addition to the aforementioned limitations of using classic normal QQ plots that examine the normality of the EB predictors of random effects to assess 2PM models, note that, in normal QQ plots, the EB predictors are used to compute both the sample quantiles represented in the y axis and the mean and variance of the normal distribution used to obtain the theoretical quantiles represented in the x axis. This circularity limits the interpretation of the resultant plot because the EB predictors are estimates themselves. Moreover, research shows that the shape of the empirical distribution of the EB predictors of the random effects does not necessarily reflect the shape of the randomeffects distribution and tends to be unimodal [16, 19], which may make the plot look artificially linear. Thus, QQ plots calculated with only EB predictors of the random effects may be misleading as a tool for examining the normality assumption. In contrast, in our proposed approach, the theoretical quantiles given in Eq. (3) are estimated directly using the MLEs or RMLEs of model parameters without the mediation of the EB predictors of random effects or the EB predictors of individual benefits. In addition, our simulations show that BQQ plots reliably perform as expected under nonnormal and normal distributions. Thus, if the model will be used to make decisions related with personalized medicine, we recommend using BQQ plots as a complementary tool for the exploration of the normality assumption for the random effects.
When the BQQ plot suggests that the assumption of normality for the random effects is not reasonably valid, data modelers can utilize linear mixed models that assume mixtures of normals [19], multivariate t distributions [50], multivariate Laplace distributions [51], or skewnormal distributions [52]. Individual benefits can still be predicted by plugging the estimates of the fixed effects and variance components as well as the EB predictors of random effects into Eq. (2). Of these models, those with mixtures of normals or skewnormal distributions are particularly attractive since they have closedform formulas for the EB predictors of the random effects that may facilitate the measurement of individual benefits in new patients [1]. Note that EB predictors of random effects in these models no longer inherit the robustness properties of the BLUPs since they are distribution dependent. Research investigating the accuracy of prospective and retrospective measures of individual benefits in new patients based on these predictors is needed [1].
QQ plots are widely used in statistical practice to examine distributional assumptions for a variety of models, not necessarily models that assume normality [24, 27, 41, 53,54,55,56]. A common criticism of these plots is that they are less objective than classical goodnessoffit tests. However, psychophysical studies have shown that QQ plots, when visually examined in comparison with plots simulated under the null distribution, are more powerful for detecting deviations from the null distribution than classical tests [40, 41, 56]. This is probably because in QQ plots the entire sample is assessed rather than a single test statistic [41]. We believe that this is a strong reason for using BQQ plots in addition to analyses based on classical goodnessoffit tests. Another reason is that classical tests are frequently uninformative when the sample size is very small or very large [57]. It is well known, for instance, that powerful normality tests conducted with sufficiently large samples reject firmly the hypothesis of normality when the deviation from normality is trivial. This is a limitation because useful realworld applications of statistical models assuming normality need only approximate normality, not perfect normality. On the other hand, if a statistical test rejects the null hypothesis of perfect goodnessoffit, a QQ plot allows assessing why there is a lack of fit and whether the lack of fit is negligible compared to plots simulated under the null hypothesis [41]. QQ plots also help in the selection of appropriate transformations of the response or transformations of covariates for model improvement.
Here, we compared the BQQ plot computed with the real data with plots computed with simulated data that assume normality for the random effects in order to gauge more objectively the deviation of the plot from the y = x line (Fig. 3). To further prevent subjectivity in the interpretation of BQQ plots, formal lineup statistical tests may be conducted [40, 41, 56]. In these tests, the BQQ plot for the real data would be lined up randomly and blindly with several BQQ plots computed with simulated data consistent with the null hypothesis of normality for the random effects, and the data analyst would attempt to single out the plot for the real data from the other plots. Formal inferential procedures for computing pvalues for these types of tests have been proposed [40, 41, 56]. Lineup tests, however, require that the analyst does not see the plot for the real data before conducting the test to prevent biases from preconceived conclusions or psychophysical artifacts. Further research is needed to formally compare the statistical power of lineup tests for BQQ plots with the power of statistical tests of normality for random effects linear models based on regular test statistics [32, 34, 35, 58, 59]. Due to the graphical nature of BQQ plots, such comparison requires the conduction of graphical perception experiments in human subjects that implement methods of visual statistical inference [40, 41, 56].
To illustrate our proposed graphical approach, we used data from 66 subjects of an imipramine clinical trial [6, 39]. As discussed by Diaz [1], this sample size was the result of a careful application of strict inclusion and exclusion criteria that ensured that the subjects were at a stable illness state before entering the study. The inclusion criteria included three numerical constraints on the individual items of the HRS that reflected severity of the depression. Eighteen subjects who did not satisfy these constraints at the end of the placebo period were excluded from the study. In addition, subjects who experienced clinically relevant disease or procedural changes during the baseline period or later were also excluded (for instance, electroconvulsive therapy, suicide attempt or evolution to mania [39]). A total of 36 subjects were excluded. This rigorous approach ensured that baseline disease severity was a reliable scientific construct and that the observed reductions in disease severity during imipramine treatment were not the result of the natural history of the disease.
Our simulations showed that the sensitivity of BQQ plots for detecting nonnormality increased with the sample size. They also showed that moderate sample sizes (50 ≤ N ≤ 100), which are common in clinical trials, were enough to detect deviations from normality in many of the investigated cases, as suggested by Figs. 8, 9, 10, 11. Moreover, with smaller sample sizes (N = 30), BQQ plots exhibited some sensitivity to nonnormality under symmetric mixtures of normal distributions, which may occur if there are omitted categorical variables [19] (Figs. 8 and 11).
A limitation of our approach is that it requires sufficiently large N_{g} values to guarantee that sample quantiles are sufficiently close to theoretical quantiles. Moreover, the approach is not applicable when N_{g} = 1 for some g, which may occur if the model has a relatively high number of covariates. Future research must investigate how to overcome these limitations. A possible solution is to treat X as a stochastic covariate vector and to work with the marginal cumulative distribution function of the individual benefits F_{m}(z) = F_{m}(z; t) = E[F(z; X, t)], where the expected value is taken with respect to the joint probability mass function of X. A reasonable estimate of F_{m}(z) is \( {\hat{F}}_m(z)={N}^{1}{\sum}_{g=1}^G{N}_g\hat{F}\left(z;{\boldsymbol{x}}_g,t\right) \), where \( \hat{F} \) is obtained by replacing fixed effects and variance components in Eq. (2) with their corresponding estimates. An estimate of the marginal pth quantile function is, therefore, \( {\hat{B}}_m(p)={\hat{B}}_m\left(p;t\right)={\hat{F}}_m^{1}(p) \), which can be obtained through numerical inversion. Thus, the BQQ plot can be computed as the points \( \left({\hat{B}}_m\left(\frac{i0.5}{N};t\right),{\hat{b}}_{t,(i)}\right) \), i = 1, …, N, where \( {\hat{b}}_{t,(1)}<{\hat{b}}_{t,(2)}<\dots <{\hat{b}}_{t,(N)} \) are the order statistics of the combined sample of EBpredicted individual benefits. Future research must examine the extent to which this plot is sensitive to deviations from normality.
Another limitation of our approach is that it requires that continuous or ordinal covariates be categorized before implementing Eq. (3). Future research must examine how to incorporate continuous covariates to BQQ plots. The extension of BQQ plots to 2PM models with noncontinuous responses also needs further research. The influence of missing observations on the performance of BQQ plots must also be investigated.
Diaz [1] showed that when the patients are severely ill, a closedform formula for the probability distribution of individual benefits can be obtained. This facilitates both the statistical analyses of individual benefits and the evaluation of the overall performance of the predictors of individual benefits in new patients. Some clinical studies, however, may include patients who are not severely ill. Further research must investigate how to extend the definition of BQQ plots to these other studies. Since there is no closedform formula in this case, additional numerical or Monte Carlo integration may be needed.
Conclusions
This paper proposes a graphical approach to evaluate the goodnessoffit of random effects models for continuous responses when the purpose of the model is to estimate individual benefits of medical treatments. In our approach, empirical quantiles of individual benefits estimated with an empirical Bayes approach are plotted against the quantiles of the distribution of individual benefits calculated under a normality assumption for the random effects. The rationale underlying our approach is that EB predictors of the random effects are robust to violations of the normality assumption [15,16,17]. In fact, EB predictors are also estimates of the BLUPs, whose optimality properties do not require the normality assumption [13].
If the normality assumption is valid, we expect empirical quantiles to be close to the theoretical quantiles. Thus, we can infer the goodnessoffit of the 2PM model if the BQQ plot does not show obvious asymmetric deviations from the y = x diagonal line. We evaluated the performance of this approach using the CVM discrepancy which measures the discrepancy between an empirical and a theoretical probability distribution. CVM discrepancies confirmed that our graphical approach captures accurately deviations from the normality assumption. Importantly, we found that the ratios R of average CVM discrepancies (\( \overline{\overline{\Omega}} \)), which compared nonnormal distributions with closely comparable normal distributions, were not smaller than 1 in all simulations. This suggests that BQQ plots are powerful tools for detecting deviations from normality of the distribution of the random effects in 2PM models and for a visual confirmation of the normality assumption as well, when the goal is to evaluate the ability of the model to gauge individual treatment benefits.
Availability of data and materials
The depression trial data are available from the corresponding author upon request. The computer code for this paper is available as an additional file and simulated datasets are available from the corresponding author on request.
Abbreviations
 2PM:

2dimensional personalized medicine
 BLUP:

Best linear unbiased predictor
 BQQ:

Benefit quantilequantile
 CVM:

Cramervon Mises
 EB:

Empirical Bayes
 HRS:

Hamilton Rating Scale
 MLE:

Maximum likelihood estimator
 REML:

Restricted maximum likelihood estimator
 RELM:

Randomeffects linear model
 QQ:

Quantilequantile
References
 1.
Diaz FJ. Estimating individual benefits of medical or behavioral treatments in severely ill patients. Stat Methods Med Res. 2019;28:911–27.
 2.
Diaz FJ. Measuring the individual benefit of a medical or behavioral treatment using generalized linear mixedeffects models. Stat Med. 2016;35:4077–92.
 3.
Laird NM, Ware JH. Randomeffects models for longitudinal data. Biometrics. 1982;38:963–74.
 4.
Fitzmaurice GM, Davidian M, Verbeke G, Molenberghs G. Longitudinal data analysis. Boca Raton: CRC Press; 2009.
 5.
Fitzmaurice GM, Laird NM, Ware JH. Applied longitudinal analysis. 2nd ed. Hoboken: Wiley; 2011.
 6.
Hedeker D, Gibbons RD. Longitudinal data analysis. Hoboken: WileyInterscience; 2006.
 7.
Diaz FJ, Rivera TE, Josiassen RC, de Leon J. Individualizing drug dosage by using a random intercept linear model. Stat Med. 2007;26:2052–73.
 8.
Diaz FJ, Cogollo MR, Spina E, Santoro V, Rendon DM, de Leon J. Drug dosage individualization based on a randomeffects linear model. J Biopharm Stat. 2012;22:463–84.
 9.
Diaz FJ, Yeh HW, de Leon J. Role of statistical randomeffects linear models in personalized medicine. Curr Pharmacogenomics Person Med. 2012;10:22–32.
 10.
Diaz FJ, de Leon J. The mathematics of drug dose individualization should be built with randomeffects linear models. Ther Drug Monit. 2013;35:276–7.
 11.
Diaz FJ. Construction of the design matrix for generalized linear mixedeffects models in the context of clinical trials of treatment sequences. Rev Colomb Estadística. 2018;41:191–233.
 12.
Verbeke G, Molenberghs G. Linear mixed models for longitudinal data. New York: Springer; 2000.
 13.
Frees EW. Longitudinal and panel data : analysis and applications in the social sciences. Cambridge; New York: Cambridge University Press; 2004.
 14.
Nasserinejad K, De Kort W, Baart M, Komárek A, Van Rosmalen J, Lesaffre E. Predicting hemoglobin levels in whole blood donors using transition models and mixed effects models. BMC Med Res Methodol. 2013;13:62.
 15.
McCulloch CE, Neuhaus JM. Misspecifying the shape of a random effects distribution: why getting it wrong may not matter. Stat Sci. 2011;26:388–402.
 16.
Mcculloch CE, Neuhaus JM. Prediction of random effects in linear and generalized linear models under model misspecification. Biometrics. 2011;67:270–9.
 17.
Wang Z. Prediction of random effects in mixed effects models under violations of the normality assumption for the random effects and a graphical approach to detect violations. Doctoral dissertation. Kansas: University of Kansas; 2019.
 18.
Verbeke G, Lesaffre E. The effect of misspecifying the randomeffects distribution in linear mixed models for longitudinal data. Comput Stat Data Anal. 1997;23:541–56.
 19.
Verbeke G, Lesaffre E. A linear mixedeffects model with heterogeneity in the randomeffects population. J Am Stat Assoc. 1996;91:217–21.
 20.
Agresti A, Caffo B, OhmanStrickland P. Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies. Comput Stat Data Anal. 2004;47:639–53.
 21.
Litière S, Alonso A, Molenberghs G. Type I and type II error unde randomeffects misspecification in generalized linear mixed models. Biometrics. 2007;63:1038–44.
 22.
Litière S, Alonso A, Molenberghs G. The impact of a misspecified randomeffects distribution on the estimation and the performance of inferential procedures in generalized linear mixed models. Stat Med. 2008;27:3125–44.
 23.
Nobre JS, Da Motta Singer J. Residual analysis for linear mixed models. Biom J. 2007;49:863–75.
 24.
Bates DM, Pinheiro JC. Mixedeffects models in S and SPLUS. 1st ed. New York: SpringerVerlag; 2000.
 25.
Gregoire TG, Schabenberger O, Barrett JP. Linear modelling of irregularly spaced, unbalanced, longitudinal data from permanentplot measurements. Can J For Res. 1995;25:137–56.
 26.
Waternaux C, Laird NM, Ware JH. Methods for analysis of longitudinal data: bloodlead concentrations and cognitive development. J Am Stat Assoc. 1989;84:33–41.
 27.
Wilk MB, Gnanadesikan R. Probability plotting methods for the analysis of data. Biometrika. 1968;55:1–17.
 28.
Verbeke G, Molenberghs G. The gradient function as an exploratory goodnessoffit assessment of the randomeffects distribution in mixed models. Biostatistics. 2013;14:477–90.
 29.
Pan ZY, Lin DY. Goodnessoffit methods for generalized linear mixed models. Biometrics. 2005;61:1000–9.
 30.
Grady JJ, Helms RW. Model selection techniques for the covariance matrix for incomplete longitudinal data. Stat Med. 1995;14:1397–416.
 31.
Diaz FJ, Santoro V, Spina E, Cogollo M, Rivera TE, Botts S, et al. Estimating the size of the effects of comedications on plasma clozapine concentrations using a model that controls for clozapine doses and confounding variables. Pharmacopsychiatry. 2008;41:81–91.
 32.
Efendi A, Drikvandi R, Verbeke G, Molenberghs G. A goodnessoffit test for the randomeffects distribution in mixed models. Stat Methods Med Res. 2017;26:970–83.
 33.
Drikvandi R, Verbeke G, Molenberghs G. Diagnosing misspecification of the randomeffects distribution in mixed models. Biometrics. 2017;73:63–71.
 34.
Alonso A, Litière S, Molenberghs G. A family of tests to detect misspecifications in the randomeffects structure of generalized linear mixed models. Comput Stat Data Anal. 2008;52:4474–86.
 35.
Abad AA, Litière S, Molenberghs G. Testing for misspecification in generalized linear mixed models. Biostatistics. 2010;11:771–86.
 36.
McCulloch C, Searle S, Neuhaus J. Generalized, linear, and mixed models. 2nd ed. Hoboken: Wiley Series in Probability and Statistics; 2008.
 37.
Ten Have TR, Localio AR. Empirical Bayes estimation of random effects parameters in mixed effects logistic regression models. Biometrics. 1999;55:1022–9.
 38.
Robinson GK. That BLUP is a good thing: the estimation of random effects. Stat Sci. 1991;6:15–32.
 39.
Reisby N, Gram LF, Bech P, Nagy A, Petersen GO, Ortmann J, et al. Imipramine: clinical effects and pharmacokinetic variability. Psychopharmacology. 1977;54:263–72.
 40.
Buja A, Cook D, Hofmann H, Lawrence M, Lee EK, Swayne DF, et al. Statistical inference for exploratory data analysis and model diagnostics. Philos Trans R Soc A Math Phys Eng Sci. 2009;367:4361–83.
 41.
Loy A, Follett L, Hofmann H. Variations of Q–Q plots: the power of our eyes! Am Stat. 2016;70:202–14.
 42.
Anderson TW. An introduction to multivariate statistical analysis. 3rd ed. New York: WileyInterscience; 2003.
 43.
Anderson TW. On the distribution of the twosample Cramervon Mises criterion. Ann Math Stat. 1962;33:1148–59.
 44.
CSöRgő S, Faraway JJ. The exact and asymptotic distributions of CramérVon Mises statistics. J R Stat Soc Ser B. 1996;58:221–34.
 45.
Darling DA. The KolmogorovSmirnov, Cramervon Mises Tests. Ann Math Stat. 1957;28:823–38.
 46.
Lange N, Ryan L. Assessing normality in random effects models. Ann Stat. 1989;17:624–42.
 47.
Shapiro SS, Wilk MB. An analysis of variance test for normality (complete samples). Biometrika. 1965;52:591–611.
 48.
Razali NM, Wah YB. Power comparisons of ShapiroWilk , KolmogorovSmirnov, Lilliefors and AndersonDarling tests. J Stat Model Anal. 2011;2:21–33.
 49.
Anderson TW, Darling DA. Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. Ann Math Stat. 1952;23:193–212.
 50.
Pinheiro JC, Liu C, Nianwu Y. Efficient algorithms for robust estimation in linear mixedeffects models using the multivariatetdistribution. J Comput Graph Stat. 2001;10:249–76.
 51.
Yavuz FG, Arslan O. Linear mixed model with Laplace distribution (LLMM). Stat Pap. 2018;59:271–89.
 52.
Lin TI, Lee JC. Estimation and prediction in linear mixed models with skewnormal random effects for longitudinal data. Stat Med. 2008;27:1490–507.
 53.
García Ben M, Yohai VJ. Quantilequantile plot for deviance residuals in the generalized linear model. J Comput Graph Stat. 2004;13:36–47.
 54.
AldorNoiman S, Brown LD, Buja A, Rolke W, Stine RA. The power to see: a new graphical test of normality. Am Stat. 2013;67:249–60.
 55.
Stine RA. Explaining normal quantilequantile plots through animation: the waterfilling analogy. Am Stat. 2017;71:145–7.
 56.
Loy A, Hofmann H, Cook D. Model choice and diagnostics for linear mixedeffects models using statistics on street corners. J Comput Graph Stat. 2017;26:478–92.
 57.
Johnson RA, Wichern DW. Applied multivariate statistical analysis. 6th ed. New Jersey: Pearson; 2008.
 58.
Waagepetersen R. A simulationbased goodnessoffit test for random effects in generalized linear mixed models. Scand J Stat. 2006;33:721–31.
 59.
Tchetgen EJ, Coull BA. A diagnostic test for the mixing distribution in a generalised linear mixed model. Biometrika. 2006;93:1003–10.
Acknowledgements
The authors thank Drs. Jo Wick, Jianghua He, John Keighley and Babalola Faseru for useful comments and suggestions. The authors also thank the two reviewers for comments and suggestions that contributed to substantial improvements to the manuscript, and Mr. Douglas Eikermann for editorial assistance.
Funding
The depression trial was supported by the P. Carl Petersens Fond (Grants Nos. B.868 and B.975) and the Danish Medical Research Council (grants Nos 512–2537, 512–3653, 512–4145, 512–5045, and 512–5719). None of the authors was involved in the planning or conduction of this trial and have never received any payment from the trial’s associated grant contracts or its investigators. Thus, the above funding institutions did not have any role in the development and writing of the ideas presented in this paper. FJD was supported by an Institutional Clinical and Translational Science Award from the National Institutes of Health (NIH), NIH/NCATS Grant Number UL1TR002366 (KL2/TL1), awarded to the University of Kansas Medical Center. The role of NIH was to partially support FJD’s salary. ZW was supported by the Biostatistics and Informatics Shared Resource of the University of Kansas Cancer Center, funded by the Cancer Center Support Grant P30 CA168524 of the National Cancer Institute; the role of this funder was to partially support ZW salary. This manuscript may not reflect the opinions or views of the NIH.
Author information
Affiliations
Contributions
FJD proposed the overall concept of the graphical approach. ZW and FJD developed and formalized the approach and designed the simulation studies. ZW wrote the computer code under FJD direction and supervision. Both authors conducted the data analysis, interpreted the simulations and wrote and approved the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
The authors of the depression study reported that the study was approved by ethical committees of the Lillhagen Hospital, Gothenburg, Sweden; the Municipal Hospital, Copenhagen, Denmark; Rigshospitalet, Copenhagen, Denmark; and State Mental Hospital, Glostrup, Denmark. They also report that patient written consents were obtained.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file 1.
Supplementary Information. Additional figures and tables.
Additional file 2.
SAS/IML code. Computer code implementing the graphical approach and calculating Cramervon Mises discrepancies.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Wang, Z., Diaz, F.J. A graphical approach to assess the goodnessoffit of randomeffects linear models when the goal is to measure individual benefits of medical treatments in severely ill patients. BMC Med Res Methodol 20, 193 (2020). https://doi.org/10.1186/s12874020010543
Received:
Accepted:
Published:
Keywords
 Chronic diseases
 Cramervon Mises discrepancy
 Disease severity
 Empirical Bayes
 Goodnessoffit
 Individual treatment benefits
 Normality assumption
 Personalized medicine models