Skip to main content
  • Research article
  • Open access
  • Published:

A semi-parametric mixed models for longitudinally measured fasting blood sugar level of adult diabetic patients

Abstract

Background

At the diabetic clinic of Jimma University Specialized Hospital, health professionals provide regular follow-up to help people with diabetes live long and relatively healthy lives. Based on patient condition, they also provide interventions in the form of counselling to promote a healthy diet and physical activity and prescribing medicines. The main purpose of this study is to estimate the rate of change of fasting blood sugar (FBS) profile experienced by patients over time. The change may help to assess the effectiveness of interventions taken by the clinic to regulate FBS level, where rates of change close to zero over time may indicate the interventions are good regulating the level.

Methods

In the analysis of longitudinal data, the mean profile is often estimated by parametric linear mixed effects model. However, the individual and mean profile plots of FBS level for diabetic patients are nonlinear and imposing parametric models may be too restrictive and yield unsatisfactory results. We propose a semi-parametric mixed model, in particular using spline smoothing to efficiently analyze a longitudinal measured fasting blood sugar level of adult diabetic patients accounting for correlation between observations through random effects.

Results

The semi-parametric mixed models had better fit than the linear mixed models for various variance structures of subject-specific random effects. The study revealed that the rate of change in FBS level in diabetic patients, due to the clinic interventions, does not continue as a steady pace but changes with time and weight of patients.

Conclusions

The proposed method can help a physician in clinical monitoring of diabetic patients and to assess the effect of intervention packages, such as healthy diet, physical activity and prescribed medicines, because individualized curve may be obtained to follow patient-specific FBS level trends.

Peer Review reports

Background

Diabetes mellitus is a metabolic disorder of multiple etiology characterized by chronic hyperglycaemia with disturbances of carbohydrate, fat and protein metabolism resulting from defects in insulin secretion, insulin resistance, or both [1]. The long-term effects of untreated diabetes mellitus might results in health complications, such as visual disability and nerve disease [25], among others. A person is considered to be diabetic if he or she has fasting blood sugar (FBS) level value of greater than or equal to 7.0 mmol/L (126 mg/dL) or 2-h blood sugar level of greater than or equal to 11.1 mmol/L (200 mg/dL) or glycated hemoglobin (HbA1) level of 6.5% or higher [6].

There are three main types of diabetes, namely type 1 diabetes, type 2 diabetes and gestational diabetes. The type 1 diabetes is caused by an auto-immune reaction, in which the patient body defense system attacks the insulin producing beta cells in the pancreas and hence the body can no longer produce the insulin it needs. Whereas in type 2 diabetes, the body is able to produce insulin, however it becomes resistant so that the insulin is ineffective. The type 2 diabetes is characterized by high levels of blood sugar or glucose resulting from defects in insulin production, insulin action, or both. The gestational diabetes is a form of diabetes that appears during pregnancy. It can lead to serious health risks for both the mother and child [7]. The risk factors that are associated with type 1 diabetes include family history of diabetes (diabetes history in one parent or both), infections and other environmental influences such as exposure to a viral illness, the presence of damaging immune system cells, i.e. autoantibodies and dietary factors low vitamin D consumption [8]. Whereas, for type 2 diabetes the risk factors are excess body weight, physical inactivity, poor nutrition, family history of diabetes, past history of gestational diabetes and older age [9]. The risk factors for increase or decrease in fasting blood sugar level of a patient include overweight, family history of diabetes, age, type of diabetes, blood pressure and gender [7]. The focus of this study however is on type 1 and type 2 diabetes.

In year 2015, there were an estimated 415 million adults aged 20–79 years living with diabetes worldwide [10], including 193 million who are undiagnosed. There were approximately 5 million people estimated to have died from diabetes worldwide in the same year, and a majority of these were the result of cardiovascular complications. In Africa Region, the number of adults living with diabetes estimated at 14.2 million whereas in Ethiopia the number is estimated 1 to 10 million in year 2015. The Region has the highest proportion of undiagnosed diabetes, 9.5 million (about 66.7%) of people with diabetes are unaware they have the disease and in Ethiopia there are 500 thousand to 5 million such cases [11, 12].

At the diabetic clinic of Jimma University Specialized Hospital (JUSH), health professionals provide regular follow-up to help people with diabetes live long and relatively healthy lives. Depending on patients conditions, e.g. FBS level, they also provide interventions in the form of counselling to promote a healthy diet and physical activity and prescribing medicines.

The main objective of the current study is to assess the factors that affect the FBS level of adult diabetic patients. In addition to assessing the factors that affect the FBS level over time, we are also interested to estimate the rate of change of FBS profile experienced by patients over time. The change may help to assess the effectiveness of interventions taken by the clinic to regulate FBS level, where rates of change close to zero over time may indicate the interventions are good regulating the level. These changes are determined using first derivatives of penalized regression splines [13, 14].

The FBS level data of diabetic patients in this study are collected repeatedly over time hence the data have longitudinal time series profiles and the data also have continuous nature. For statistical inferences, therefore, it is necessary to capture properly the form of the evolution of profiles over time. In the analysis of longitudinal data, the mean profile is often estimated by parametric linear mixed effects model, for instance recently Mehari [15] analyzed the FBS level profiles of diabetic patients using parametric linear mixed effects model. However, the individual and mean profile plots of FBS level for diabetic patients (see Fig. 1) are nonlinear and imposing parametric models may be too restrictive and yield unsatisfactory results. In the present paper, we propose a semi-parametric mixed model in particular using spline smoothing [16, 17] to efficiently analyze a longitudinal measured fasting blood sugar level of adult diabetic patients accounting for correlation between observations through random effects. The model assumes that the mean of FBS level is an arbitrary smooth function of time and parametric functions of other covariates. The link between mixed model and smoothing provides a flexible framework for estimating the patient profiles in a data driven way [13].

Fig. 1
figure 1

(a) individual profile and (b) mean profile plots for FBS level of diabetes patients in JUSH, September 2011 - June 2014

The rest of the paper is organized as follows. The data, some basic review of variance-covariance structure of the parametric linear mixed model, semi-parametric mixed models and inferences related them are introduced in “Methodology” section. The results from applying these methods on the the study data are discussed in “Results” section. Finally discussion, and conclusions and pointers for future study are given in “Discussion” and “Conclusion” sections respectively.

Methodology

Study data

The fasting blood sugar (FBS) level data used in this paper arises from a retrospective study conducted in Jimma University Specialized Hospital (JUSH) diabetic clinic. The hospital is located in Jimma town 352 km to the Southwest of Addis Ababa, the capital of Ethiopia. It is a teaching hospital and gives service to the southwestern part of Oromia region, some part of southern nations and nationalities and Gamella regions of Ethiopia. All diabetic patients aged 18 years or older, who were coming to JUSH diabetic clinic for their regular follow up during periods September 2011 and June 2014 were eligible for this study. During their follow up, patients FBS level along with other characteristics such as weight are measured and recorded in the individual follow up chart. The data in the chart include time (measured in months, where baseline or initial date was given a value 0), patient gender, age, type of diabetes (Type 1 diabetes or Type 2 diabetes) and family diabetes history. The duration between initial and the last recorded visits ranged from one to 36 months. Patients with at least two observations were included in the analyses leading to a total of 534 patients and 4390 observations. Permission of the study was obtained from Postgraduate research office of Jimma University, College of Natural Sciences and JUSH.

Variance-covariance structures and inference

Variance-covariance structures

The FBS level data of this study fall within the framework of continuous longitudinal data and hence can be modeled by use of a parametric linear mixed model. Let Yij denote the FBS level of the ith patient observed at time tij, i=1,…,n and j=1,…mi. The parametric linear mixed model can be expressed as

$$ Y_{ij} = \sum\limits_{k=0}^{p} \beta_{k}\,t_{ij}^{k} + \sum\limits_{l=1}^{L} \theta_{l}\,x_{ijl} + \sum\limits_{u=0}^{q} b_{u_{i}}\,t_{ij}^{u} + \varepsilon_{ij}. $$
(1)

That is, the population level mean response is modeled as a polynomial function of time, tij, a linear function of covariates xijl, l=1,…,p where some of them may be time-varying covariates or interaction effects each has corresponding regression parameter coefficient θl, a function of subject-specific random coefficient terms and measurement error εij. The coefficients βk, k=1,…,p and θl, l=1,…,L are fixed effect parameters and \(b_{u_{i}}\), u=0,…,q are subject-specific random coefficients. It is assumed that \(b_{u_{i}} \sim \mathcal {N}\left (0, \sigma _{b_{u}}^{2}\right)\), \(\varepsilon _{ij} \sim \mathcal {N}\left (0, \sigma _{e}^{2}\right)\), \(cov\left (b_{u_{i}}, b_{u_{i'}}\right) = \sigma _{b_{u} b_{u}'}\) and \(cov\left (b_{u_{i}}, \varepsilon _{ij}\right) = 0\). We have examined models for p=2 which represents quadratic polynomial and \(b_{u_{i}}\) with u=0,1,2 represent a subject-specific random intercept, slope and quadratic coefficients, respectively for selection of a variance-covariance structure (see Table 1). The variance profile plot of FBS level shows (for the sake of brevity this plot is not reported) the variance changes overtime, therefore to allow for more flexibility to estimate between subject variability we have considered the above three variance-covariance structures.

Table 1 Linear mixed models for selection of variance-covariance structure for FBS level, JUSH, September 2011 - June 2014

In Table 1, for instance the subject-specific random intercept \(b_{0_{i}}\) in the quadratic random effects model (M3) is considered to capture correlation of the FBS level measurements over time within the patient and it is assumed that subject-specific random slopes for linear as well as for quadratic time effects to capture different evolution of FBS level over time. Note that these subject-specific random structures are different for each patient.

Tests for zero variance components

Adequate variance-covariance structure is essential to obtain valid model based inferences for the fixed effects or for parameters in the mean structure of the model [18]. Over-parametrization of the variance-covariance structure leads to inefficient estimation and potentially poor assessment of standard errors for the estimation of the mean structure, i.e. fixed effects, whereas a too restrictive specification invalidates inferences about the mean response profile when the assumed structure does not hold.

The likelihood ratio test for testing, for example \(H_{0}: \sigma ^{2}_{b_{0}} = 0\) versus \(H_{1}: \sigma ^{2}_{b_{0}} > 0\) for model M1, has an asymptotic \(0.5\,\chi ^{2}_{0} + 0.5\,\chi ^{2}_{1}\) mixture distribution under H0 [19], if the vector of FBS level can be divided into a large number of independent and identically distributed sub-vectors both under H0 and H1. However, this assumption usually does not hold, for example in linear mixed models or for unbalanced data [2022]. Note that the FBS level data are unbalanced in the sense that all patients do not have equal number of measurements, hence the independent and identically distributed assumption can be violated in the linear mixed models used in this paper. Therefore, we used the exact finite sample null distribution of the restricted likelihood ratio test (RLRT) statistic derived by Crainiceanu and Ruppert [22] to test a zero random effect variance in M1. However, since models M2 and M3 contain more than one random effect, the tests for a zero random effect variance in these models were done using the exact finite sample null distribution of the RLRT statistic derived by Greven et al. [21].

Semi-parametric mixed effects model

Given the mean profile plots over time in Fig. 1b, imposing parametric functions to describe the mean FBS level evolution may not be easy and also too restrictive [17]. As an alternative, we can model the mean profiles over time with a semi-parametric smooth function, f(tij). Using the pth degree truncated power basis, f(tij) can be written as

$$ f\left(t_{ij}\right) = \beta_{0} + \beta_{1}\,t_{ij} + \beta_{2}\,t_{ij}^{2} + \ldots + \beta_{p}\,t_{ij}^{p} + \sum\limits_{l=1}^{K} b_{l}\,\left(t_{ij} - \kappa_{l}\right)_{+}^{p}, $$
(2)

here z+=max{0,z}. The function f(tij) is a combination of fixed effects parameters β0,β1,…,βp and pth degree splines evaluated at time tij with knots at distinct locations κ1,κ2,…,κK in the range of tij and corresponding coefficients b1,b2,…,bK. The function f(tij) can be estimated among others, with penalized splines. The coefficients of spline basis functions bl are assumed to follow a Gaussian distribution such that \(b_{l} \sim \mathcal {N}\left (0, \sigma ^{2}_{b}\right)\), where \(\sigma ^{2}_{b}\) is a variance component controlling the smoothness of f(tij). Then, incorporating f(tij) in model (1), the general semi-parametric mixed effects model can be expressed as

$$ Y_{ij} = f\left(t_{ij}\right) + \sum\limits_{l=1}^{L} \theta_{l}\,x_{ijl} + \sum\limits_{u=0}^{q} b_{u_{i}}\,t_{ij}^{u} + \varepsilon_{ij}. $$
(3)

Estimation of parameters

Let \(\mathbf {y}_{i} = \left (y_{i1}, y_{i2}, \ldots, y_{{im}_{i}}\right)'\) be the mi×1 vector of responses for the ith patient, i=1,…,n. Under the linear mixed model formulation, model (3) with subject-specific quadratic random effects can be expressed succinctly in matrix form as

$$ \mathbf{ y}_{i} = \mathbf{X}_{i}\,\mathbf{\beta} + \mathbf{Z}_{i(f)}\,\mathbf{v} + \mathbf {Z}_{i(u)}\,\mathbf {u}_{i} + \mathbf{e}_{i} $$
(4)

where β=(β0,β1,…,βp,θ1,…,θL) is a (p+L+1)×1 vector of fixed effects which is common to the n individuals, Xi is an mi×(p+L+1) design matrix associating β to yi, v=(b1,b2,…,bK) is a K-dimensional vector of random coefficients in the summand in Eq. (2), Zi(f) is the mi×K matrix for the pth-degree spline basis functions, \(\textbf {u}_{i} = \left (b_{0_{i}}, b_{1_{i}}, b_{2_{i}}\right)'\) is subject-specific vector of random effects, Zi(u) is an mi×3 design matrix which relates ui to the response yi and \(\textbf {e}_{i} = \left (e_{1i}, e_{2i}, \ldots, e_{{im}_{i}}\right)'\) is an mi-dimensional vector of within-individual errors. Furthermore, it is assumed that \(\textbf {v} \sim \mathcal {N}\left (\textbf {0}, \sigma _{b}^{2}\,\textbf {I}_{K}\right)\), \(\textbf {u}_{i} \sim \mathcal {N}(\textbf {0}, \textbf {G})\), \(\textbf {e}_{i} \sim \mathcal {N}\left (\textbf {0}, \textbf {R}_{i}\right)\), v, ui and ei are assumed to be pairwise independent with and between subjects for i=1,2,…,n. Note that G and Ri are 3×3 and mi×mi variance-covariance matrices, respectively.

The overall model for n individuals has the form

$$\mathbf {y} = \mathbf{ X}\,\mathbf{\beta} + \mathbf {Z}\,\mathbf {b} + \mathbf{ e} $$

where

$$\begin{array}{*{20}l} &\textbf{y} = \left (\begin{array}{c} \textbf{y}_{1} \\ \textbf{y}_{2} \\ \vdots \\ \textbf{y}_{n} \end{array} \right) ~~ \textbf{X} = \left (\begin{array}{c} \textbf{X}_{1} \\ \textbf{X}_{2} \\ \vdots \\ \textbf{X}_{n} \end{array} \right),\\ ~~ &\textbf{X}_{i} = \left (\begin{array}{cccccccc} 1 & t_{i1} & t_{i1}^{2} & \ldots & t_{i1}^{p} & x_{i11} & \ldots & x_{i1L} \\ 1 & t_{i2} & t_{i2}^{2} & \ldots & t_{i2}^{p} & x_{i21} & \ldots & x_{i2L} \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 1 & t_{{im}_{i}} & t_{{im}_{i}}^{2} & \ldots & t_{{im}_{i}}^{p} & x_{{im}_{i}1} & \ldots & x_{{im}_{i}L} \end{array} \right), \end{array} $$
$$\begin{array}{*{20}l} &\textbf{Z} = \left (\begin{array}{ccccc} \textbf{Z}_{1(f)} & \textbf{Z}_{1(u)} & \textbf{0} & \ldots & \textbf{0} \\ \textbf{Z}_{2(f)} & \textbf{0} & \textbf{Z}_{1(u)} & \ldots & \textbf{0} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \textbf{Z}_{n(f)} & \textbf{0} & \textbf{0} & \ldots & \textbf{Z}_{n(u)} \end{array} \right),\\ ~~ &\textbf{Z}_{i(u)} = \left (\begin{array}{cccc} 1 & t_{i1} & t_{i1}^{2} \\ 1 & t_{i2} & t_{i2}^{2} \\ \vdots & \vdots & \vdots \\ 1 & t_{{im}_{i}} & t_{{im}_{i}}^{2} \end{array} \right), \end{array} $$
$$\begin{array}{*{20}l} &\textbf{Z}_{i(f)} = \left (\begin{array}{cccc} (t_{i1} - \kappa_{1})_{+}^{p} & (t_{i1} - \kappa_{2})_{+}^{p} & \ldots & (t_{i1} - \kappa_{K})_{+}^{p} \\ (t_{i2} - \kappa_{1})_{+}^{p} & (t_{i2} - \kappa_{2})_{+}^{p} & \ldots & (t_{i2} - \kappa_{K})_{+}^{p} \\ \vdots & \vdots & \ddots & \vdots \\ (t_{{im}_{i}} - \kappa_{1})_{+}^{p} & (t_{{im}_{i}} - \kappa_{2})_{+}^{p} & \ldots & (t_{{im}_{i}} - \kappa_{K})_{+}^{p} \end{array} \right),\\ ~~ &\textbf{e} = \left (\begin{array}{c} \textbf{e}_{1} \\ \textbf{e}_{2} \\ \vdots \\ \textbf{e}_{n} \end{array} \right) \end{array} $$

and \(\phantom {\dot {i}\!}\textbf {b} = (b_{1}, b_{2}, \ldots, b_{k}, b_{0_{1}}, b_{1_{1}}, b_{2_{1}}, \ldots, b_{0_{n}}, b_{1_{n}}, b_{2_{n}})'\). Estimation of the coefficients of penalized and unpenalized terms in model (4) was done using a penalized iteratively reweighted least squares (P-IRLS) based on 20 equidistant knots in the range of FBS level and a smoothing parameter selection was done by REML [23].

The correspondence between the penalized spline smoother and the optimal predictor in a mixed model framework enables us to take advantage of the existing methodology for mixed model analysis and the use of mixed model software, such as the function gamm in mgcv R package, for fitting the penalized spline model and the MIXED and GLIMMIX procedures in SAS [24]. This implementation of penalized smoothing in the linear mixed model framework also provides an automated approach to obtain a smoothing parameter and flexibility to extend the models [17].

In this paper, parameters in the fitted models are estimated by restricted maximum likelihood (REML) method because the statistical hypotheses that were considered have the same mean structures between models under the null and alternative hypotheses. Furthermore, maximum likelihood estimators of variance components are biased downward as they do not take into account the degrees of freedom lost in the estimation of fixed effects (e.g. see Ruppert et al. [16]).

Model selection and inference

The model building process of this work includes selection of suitable variance-covariance structure for random effects, testing whether the inclusion of spline effects in the parametric model improves model fit or not and also selection of covariates. The linear mixed model framework provides a unified approach to do all these [25]. In the parametric cases, the best fitting model can be selected by employing a commonly used selection criteria, Akaike’s Information Criterion (AIC) and Bayesian Information Criterion (BIC) or by a likelihood ratio test. However, since the semi-parametric mixed models that we considered here are differ in both the fixed effects and the nonparametric part, model selection is done via adjusted Akaike’s information criterion, abbreviated AIC adj, using the effective number of parameters in the model [16, 26]. Let C= [X Zf] be the design matrix with appropriate fixed effects components and the corresponding smoothing matrix, \(\textbf {B} = \left (\begin {array}{cc} \textbf {0} & \textbf {0} \\ \textbf {0} & \textbf {G}^{-1} \end {array} \right)\)where G is the variance-covariance matrix of random effects used in the model and R=diag{R1,R2,…,Rn}, i.e. R is the block diagonal variance-covariance matrix of error terms with blocks Ri on the main diagonal and zeros elsewhere. Then the effective number of parameters and AIC adj may be computed as

$$E_{p} = trace\left\{\left(\textbf{C}'\,\textbf{R}^{-1}\,\textbf{C}\right)^{-1}\textbf{C}'\,\textbf{R}^{-1}\,\textbf{C} \right\} $$

and AIC adj=−2 log(Lik)+2 Ep, respectively. Unlike the marginal AIC which penalizes only for the number of parameters in fixed effects vector and variance components, the penalty of AIC adj takes into account the additional parameters introduced into a model via f(tij) or smoothing by including the design matrix Zf in C [17]. Like the marginal AIC, the smaller the AIC adj value, the better the model.

Testing whether the inclusion of spline effects in the parametric model improves model fit or not is equivalent to testing \(H_{0}: \sigma ^{2}_{b} = 0\) versus \(H_{1}: \sigma ^{2}_{b} > 0\). In this paper, due to the second objective of the study, a quadratic penalized spline was added in Eq. (1), therefore neither of the two methods discussed in “Variance-covariance structures and inference” section can be used to test \(H_{0}: \sigma ^{2}_{b} = 0\) [27] instead an approximate F-test of Hastie and Tibshi [28] was applied. For Hastie and Tibshi approximate F-test, residual degrees of freedom for the null and alternative model fits are used in the place of the number of parameters in each model.

Rate of change over time and simultaneous confidence bands

The change in smoothing function f(t) overtime, for selected semi-parametric mixed model, can be estimated by taking the derivative of f(t) with respect to time t. For example, let f(t) be a quadratic penalized spline, that is

$$f(t) = \beta_{0} + \beta_{1}\,t + \beta_{2}\,t^{2} + \sum\limits_{l=1}^{K} b_{l}\,\left(t_{ij} - \kappa_{l}\right)_{+}^{2}. $$

Taking the first derivative with respect to time t yields

$$f'(t) = \beta_{1} + 2\,\beta_{2}\,t + 2\,\sum\limits_{l=1}^{K} b_{l}\,\left(t_{ij} - \kappa_{l}\right)_{+}. $$

An estimate of f(t), denoted \(\hat {f}'(t)\), is obtained by substituting the quadratic fit parameter estimates \(\hat {\beta }_{1}, \hat {\beta }_{2}\), and \(\hat {b}_{1}, \hat {b}_{2}, \ldots, \hat {b}_{K}\). However, the construction of simultaneous confidence bands requires the variance-covariance matrix for the vector of contrasts between the estimated and true parameters for the fixed and random effects. Let C= [X Zf] be a design matrix containing quadratic time effects and a truncated quadratic basis, B is a matrix constructed from variance components corresponding to smoothing, i.e. Var(v) in model (4). Then, a variance-covariance matrix for the vector of contrasts is given by

$$Var\left (\left[ \begin{array}{c} \hat{\beta} - \beta \\ \hat{\textbf{v}} - \textbf{v} \end{array} \right]\right) \simeq \left(\textbf{C}'\textbf{R}^{-1}\,\textbf{C} + \textbf{B} \right)^{-1} $$

Ruppert et al. [16], where R is the block diagonal variance-covariance matrix of error terms defined in “Semi-parametric mixed effects model” section. Let g=(g1,g2,…,gT) be a grid of equally spaced time points. Define

$$\hat{\textbf{f}}_{g} - \textbf{f}_{g} = \textbf{C}_{g}\,\left(\begin{array}{c} \hat{\beta} - \beta \\ \hat{\textbf{v}} - \textbf{v} \end{array} \right) $$

where Cg is C with design matrices X and Zf are evaluated over g. Assuming the vector of contrasts have approximately multivariate distribution with mean vector 0 and variance-covariance matrix (CR−1 C+B)−1 [16, 29], i.e.

$$ \left(\begin{array}{c} \hat{\beta} - {\beta} \\ \hat{\textbf{v}} - \textbf{v} \end{array} \right) \sim \mathcal{N}\left(\textbf{0}, \left(\textbf{C}'\textbf{R}^{-1}\,\textbf{C} + \textbf{B} \right)^{-1}\right) $$
(5)

a 100 (1−α)% simultaneous confidence bands for fg is given by

$$ \hat{\textbf{f}}_{g} \pm h_{(1-\alpha)}\,\textbf{s}_{g} $$
(6)

where \(\textbf {s}_{g} \,=\, \left (\widehat {SD}\left (\hat {f}_{g_{1}} \,-\, f_{g_{1}}\right), \widehat {SD}(\hat {f}_{g_{2}} \,-\, f_{g_{2}}), \ldots, \widehat {SD}\left (\hat {f}_{g_{T}} \,-\, f_{g_{T}}\right)\right)'\) with

$${\begin{aligned} \widehat{SD}\left(\hat{f}_{g_{m}} - f_{g_{m}}\right) = \sqrt{\text{the} ~~(m,m)th ~~ \text{diagonal element of}~~ Var\left(\hat{\textbf{f}}_{g} - \textbf{f}_{g}\right)} \end{aligned}} $$

and \(Var\left (\hat {\textbf {f}}_{g} - \textbf {f}_{g}\right) = \textbf {C}_{g}\,\left (\textbf {C}'\textbf {R}^{-1}\,\textbf {C} + \textbf {B} \right)^{-1}\,\textbf {C}'_{g}\), and h(1−α) is the (1−α) quantile of

$$ \sup\left| \frac{\hat{f}(t) - f(t)}{\widehat{SD}\left\{\hat{f}(t) - f(t)\right\}} \right| \approx \max_{1 \le m\le T} \left| \frac{\left (\textbf{C}_{g}\,\left[ \begin{array}{c} \hat{\beta} - {\beta} \\ \hat{\textbf{v}} - \textbf{v} \end{array} \right]\right)}{\widehat{SD}\left\{\hat{f}(g_{m}) - f(g_{m})\right\} }\right|. $$
(7)

The quantile h(1−α) can be approximated using simulations. First we simulate from realization of (5) and computation of (7) can be repeated for a large number of times, say N times, to obtain \(\tilde {h}^{1}_{1-\alpha }, \tilde {h}^{2}_{1-\alpha }, \ldots, \tilde {h}^{N}_{1-\alpha }\). The value with rank N×(1−α) is used as h1−α.

The proposed semi-parametric mixed models were fitted with the the gamm function available in R package mgcv [29] and the linear mixed models using the lme function available in R package nlme.

Results

Patients baseline characteristics

A total of 534 adult diabetic patients were in the study, of which 342 (64.04%) were male, 399 (74.72%) were Type 2 diabetic patients and 417 (78.09%) didn’t have family history of diabetes. The patients mean (SD) age at the first visit (or baseline) was 45.40 (14.62) years and ranges between 18 and 93 years, weight was 62.83 (13.36) kgs and FBS level was 164.72 (86.20) mg/L. There were significant differences of these means between Type 1 and Type 2 diabetic groups (Table 2).The results in Table 2 also show that, at baseline there was a significant association between family history of diabetes and type of diabetes (p-value <0.0001). However, the association between patient gender and type of diabetes was nonsignificant (p-value = 0.9935). The median (first quartile - third quartile) time between first and last clinic visits of patients was 15.25 (7.25 - 24.75) months and ranged from as few as 0.5 month between visits to as much as 6 months between visits.

Table 2 Baseline characteristics of adult diabetic patients in JUSH, September 2011 - June 2014

Parametric mixed models

Mean structure

The main interest of this study is to apply semi-parametric mixed models, however for comparison purpose here we start the analysis by fitting parametric mixed models. Scatter plot smoothing was used to examine changes in FBS level over time and also to asses the interactions of each categorical covariate with time [30, 31]. The smoothing plots suggest the changes in FBS can be described by quadratic trend. Furthermore, due to the non-crisscrossing of trends representing Type 1 and Type 2 diabetes groups, and with family history and no family history groups Type × time and Family history × time were not included in the mixed models. However, the trend representing male and female crossing at one time point. Therefore, we start with very general model that includes time (in quadratic form), other fixed effects and the necessary interactions, that is

$$ {\begin{aligned} \begin{array}{cc} E(Y_{ij}) & = \beta_{0} + \beta_{1}\,time + \beta_{2}\,time^{2} + \beta_{3}\,Age + \beta_{4}\,Gender + \beta_{5}\,Gender \times time\\ &+\beta_{6}\,Type + \beta_{7}\,F.History + \beta_{8}\,Weight + \beta_{9}\,Weight \times time, \end{array} \end{aligned}} $$
(8)

where Type and F.History represent diabetes type and family history of diabetes, respectively.

Variance-covariance structure for random effects

The above mean structure fitted with subject-specific random intercepts, linear random time effects and quadratic random time effects. For each of the models, the independent error structure is assumed and the results are given in Table 3.

Table 3 Parameter estimates (standard errors, s.e.), p-values for associated t-tests and model fit criteria, FBS level of diabetes patients in JUSH, September 2011 - June 2014

The fixed effect estimates were consistent in sign but have slight differences in magnitude across the three different variance-covariance structures. The variables age, gender, diabetes type, family history, and time by weight and gender by time interactions were statistically nonsignificant in all models, except for time by weight interaction where its p-value marginally significant for subject-specific random intercept and slope model (i.e. linear random effects model). The covariates that were statistically significant at 5% level, i.e. Time, Time2 and weight and the time by weight interaction were retained for the subsequent analysis.

The Crainiceanu and Ruppert [22] RLRT statistic for testing \(H_{0}: \sigma ^{2}_{b_{0}}=0\) against \(H_{1}: \sigma ^{2}_{b_{0}} > 0\) in model M1 takes the value RLRT = 738.24 with p-value <0.0001. The large value of the test statistic or a very small p-value strongly suggests a rejection of the null hypothesis (i.e. \(H_{0}: \sigma ^{2}_{b_{0}}=0\)) that no subject-specific random effects should be included in the model. Similar tests were conducted using the exact finite sample null distribution of the RLRT statistic of Greven et al. [21] to test \(H_{0}: \sigma ^{2}_{b_{1}}=0\) against \(H_{1}: \sigma ^{2}_{b_{1}} > 0\) and \(H_{0}: \sigma ^{2}_{b_{2}}=0\) against \(H_{1}: \sigma ^{2}_{b_{2}} > 0\) in models M2 and M3, respectively. The RLRT statistic is 3.944 with p-value = 0.0207 for \(H_{0}: \sigma ^{2}_{b_{1}}=0\) indicating rejection of the null hypothesis which implies the need for subject-specific random slopes. Whereas the RLRT statistic for \(H_{0}: \sigma ^{2}_{b_{2}}=0\) is 0.639 with p-value = 0.1859 suggesting a non-rejection of the null hypothesis \(H_{0}: \sigma ^{2}_{b_{2}}=0\) which implies no quadratic random effect should be included in the model. Therefore, in the subsequent analysis we use the following parametric linear mixed model, called M4:

$$ {\begin{aligned} y_{ij} = \beta_{0} + \beta_{1}\,weight + \beta_{2}\,t_{ij} + \beta_{3}\,t_{ij} \times weight + \beta_{4}\,t_{ij}^{2} + b_{0_{i}} + b_{1_{i}}\,t_{ij} + e_{ij}. \end{aligned}} $$
(9)

The analysis results for model M4 are presented in Table 4. Except the time × weight interaction effect, which is marginally non-significant at 5% level, all the fixed effects are highly significant.

Table 4 Parameter estimates (standard errors, s.e.) and p-values for associated t-tests for model M4, FBS level of diabetes patients in JUSH, September 2011 - June 2014

Semi-parametric mixed model

The observed mean FBS level profile of patients is shown in Fig. 1b. The plot in this figure shows that the linearity assumption is not reasonable. Therefore, the analysis had to account for the longitudinal data structure and the observed nonlinearity of FBS level estimated with smooth effects in the mixed model framework. Given our specific interest in rate of change in FBS level due to clinic interventions, its functional form (over time) can affect the rate of change. Since the rate of change involves taking derivatives of the smooth function f(tij), we choose to use quadratic penalized spline to model the FBS level mean response [31]. Following results from the previous section, we propose the following semi-parametric mixed model with linear random effects structure, called M5

$$ \begin{aligned} &y_{ij} = \beta_{0} + \beta_{1}\,weight + \beta_{2}\,t_{ij} + \beta_{3}\,t_{ij} \times weight + \beta_{4}\,t_{ij}^{2}\\ &\qquad+ \sum_{l=1}^{K} b_{l}\,(t_{ij} - \kappa_{l})_{+}^{2}+ \sum_{u=0}^{1} b_{u_{i}}\,t_{ij}^{u} + e_{ij}. \end{aligned} $$
(10)

Using appropriately constructed matrices this model can be represented using a matrix notation of “Tests for zero variance components” section. This model is fitted using the random intercept and linear random effects variance structures of the previous section and the results are displayed in Table 5.

Table 5 Parameter estimates (standard errors, s.e.), p-values for associated t-tests and variance components estimates of semi-parametric models under various variance structures, FBS level of diabetes patients in JUSH, September 2011 - June 2014

The results in Table 5 show that the fixed effects estimates were consistent in sign but have slight difference in magnitude in both semi-parametric and parametric mixed models (see Table 4), except for the effect of time where both the sign and magnitude of its coefficient estimates were different in the two models and the effect of "time square" was nonsignificant in the semi-parametric mixed models. Further, the interaction of weight with time was not statistically significant in any of the semi-parametric mixed model. Except for the subject-specific random slope variance component, there is a slight decrease in subject-specific random intercept and residual variance components in the semi-parametric model compared to variance components in the linear mixed model M4 (see Table 4).

To compare the two variance structures under the semi-parametric mixed model given in Eq. (10), we computed AIC, BIC and adjusted AIC (see Table 6). Adjusted AIC shows that the semi-parametric mixed model with subject-specific intercepts as well as slopes (or random linear effects) value is smaller than that of the random intercept. Therefore, the semi-parametric model with random linear effects is the preferred model.

Table 6 Fit statistics for model M5 and M4, FBS level of diabetes patients in JUSH, September 2011 - June 2014

Model selection

In this section we are focusing on assessing whether the inclusion of spline effects improves model fit compared to parametric counterpart. This is equivalent to testing \(H_{0}: \sigma ^{2}_{b} = 0\) versus \(H_{1}: \sigma ^{2}_{b} > 0\) in model M5, where \(\sigma ^{2}_{b}\) is a variance component controlling the smoothness of

$$f(t_{ij}) = \beta_{0} + \beta_{1}\,t_{ij} + \beta_{2}\,t_{ij}^{2} + \sum_{l=1}^{K} b_{l}\,(t_{ij} - \kappa_{l})_{+}^{2}. $$

The approximate F-test statistic for testing the above hypotheses, i.e. quadratic form of f(tij) against a quadratic penalized splines, is 83.63 with p-value <0.0001. This strongly suggests a rejection of the null hypothesis \(H_{0}: \sigma ^{2}_{b} = 0\). Thus, the shape of the function f(tij) is statistically different from a quadratic trend.

Furthermore, consider the semi-parametric mixed model M5 in Eq. (10) with random linear effects variance-covariance structure and the linear mixed model M4 in Eq. 9. The fit statistics from fitting these two models are displayed in Table 6. The −2 log(Lik), AIC and BIC values indicate a substantial improvement in the fit of M5 compared to M4, implying model with penalized spline representation of FBS level was preferred over its parametric counterpart.

The overall results show that, out of the models evaluated, FBS level of diabetes patients at the JUSH diabetic clinic during the study period best characterized by a penalized spline model with truncated quadratic basis, with subject-specific random intercept and slope effects and with linear function of weight and time, called the final model, M6.

Simultaneous confidence band

The first derivative of mean response function, i.e. \(\hat {f}'(.)\), with respect to time was estimated for the final model, M6 holding weight constant. The rate of change in mean response of FBS level then investigated using the 95% simultaneous confidence bands for the model. The confidence bands were constructed following the discussion in “Estimation of parameters” section. A gride g of time points (0, 35) were defined by increments of one month such that there are T=36 equally spaced time points. The resulting simultaneous confidence bands displayed in Fig. 2 where the solid line and shaded region represent, respectively, the mean predicted FBS level and the confidence bands. Visual inspection indicates that on average diabetes patients were able to decrease or control their FBS level, due to JUSH clinic interventions, in the first five months period after their initial visit. However, after month 5, the slope of the curve starts changing it signs, this might imply that patients do actually not follow-up the intervention packages properly or not come to the clinic for treatment due to some unknown reason.

Fig. 2
figure 2

95% simultaneous confidence bands for FBS level of diabetes patients in JUSH, September 2011 - June 2014

The confidence bands become noticeably wider after 27 months of follow-up period, demonstrating the increased variability. This increase may be due to a smaller number of FBS level recordings being observed at the later period of the study or a potential artifact induced by the spline smoothing [32]. In practice spline smoothing creates a challenge in semi-parametric regression settings through the inherent bias from using truncated basis functions. The confidence bands obtained for FBS level does not account for this function bias. However, this bias could be corrected, e.g. using bootstrapping methods [33].

Discussion

This study focused on longitudinal data analysis of fasting sugar level of adult diabetic patients at Jimma University Specialized Hospital diabetic clinic using an application of semi-parametric mixed model. The study revealed that the rate of change in FBS level in diabetic patients, due to the clinic interventions, does not continue as a steady pace but changes with time and weight of patients. Furthermore, it clarified the associations between FBS level and some characteristics of adult diabetic patients that weight of a diabetes patient has a significant negative effect whereas patient gender, age, type of diabetes and family history of diabetes did not have a significant effect on the change of FBS level. The result on gender agrees with the findings of [34] where the gender effect on fasting blood glucose level of type 2 diabetes was statistically nonsignificant.

Under the two variance-covariance structures of subject-specific random effects, the semi-parametric mixed models had better fit than their parametric counterparts. This was likely due to the localized splines which captured more variability in FBS level than the linear mixed models. The methodology used in the analysis has implications for clinical monitoring in regular followup of diabetic patients and to assess the effect of intervention packages, such as healthy diet, physical activity and prescribed medicines, because individualized curve may be obtained to follow patient-specific FBS level trends [31].

The main limitation of the study is the limited information on important predictors such as type of interventions including treatment types and nutritional status of a patient that may have influenced the rate of change in FBS level. Due to lack of data on these potential predictors for most of the patients involved in the study, we were unable to include them in the analyses. Therefore, more public health and epidemiology researches are needed to examine the impact of treatments and interventions on population health in general and in particular, people living with diabetes to avoid its complications over time and to identify new risk factors for diabetes.

Conclusion

In this paper, we demonstrate the use of semiparametric mixed effect model for estimation of the rate of change of fasting blood sugar (FBS) level experienced by patients over time. The proposed method can help a physician in clinical monitoring of diabetic patients and to assess the effect of intervention packages such as healthy diet, physical activity.

Abbreviations

AIC:

Akaike’s information criteria

BIC:

Bayesian information criteria

FBS:

Fasting blood sugar level

IDF:

International diabetes federation

JUSH:

Jimma University specialized hospital

P-IRLS:

Penalized iteratively reweighted least squares

REML:

Restricted maximum likelihood

RLRT:

Restricted likelihood ratio tests

SD:

Standard deviation

References

  1. World Health Organization and others: Definition, Diagnosis and Classification of Diabetes Mellitus and Its Complications: Report of a WHO Consultation. Part 1, Diagnosis and Classification of Diabetes Mellitus. 1999. World Health Organization and others.

  2. Campos C. Chronic hyperglycemia and glucose toxicity: pathology and clinical sequelae. Postgrad Med. 2012; 124(6):90–7.

    Article  Google Scholar 

  3. Rahman S, Rahman T, Ismail AA-S, Rashid ARA. Diabetes-associated macrovasculopathy: pathophysiology and pathogenesis. Diabetes Obes Metab. 2007; 9(6):767–80.

    Article  CAS  Google Scholar 

  4. Lu C-C, Jiann B-P, Sun C-C, Lam H-C, Chu C-H, Lee J-K. Association of glycemic control with risk of erectile dysfunction in men with type 2 diabetes. J Sex Med. 2009; 6(6):1719–28.

    Article  CAS  Google Scholar 

  5. Enzlin P, Rosen R, Wiegel M, Brown J, Wessells H, Gatcomb P, Rutledge B, Chan K-L, Cleary PA, et al. Sexual dysfunction in women with type 1 diabetes: long-term findings from the dcct/edic study cohort. Diabetes Care. 2009; 32(5):780–5.

    Article  Google Scholar 

  6. World Health Organization. Use of glycated haemoglobin (HbA1c) in diagnosis of diabetes mellitus. Abbreviated report of a WHO consultation. Geneva: World Health Organization; 2011, p. 25. Available from: (http://www.ncbi.nlm.nih.gov/books/NBK304267/).

  7. World Health Organization. Definition and diagnosis of diabetes mellitus and intermediate hyperglycaemia: Report of a WHO/IDF consultation (PDF). Geneva: World Health Organization; 2006, p. 21. ISBN: 978-92-4-159493-6.

    Google Scholar 

  8. Smith DE. Type 1 Diabetes Risk Factor. 2016. https://www.endocrineweb.com/conditions/type-1-diabetes/type-1-diabetes-risk-factors.

  9. Leontis LM, Hess-Fischl AM. Type 2 Diabetes Risk Factor. 2014. https://www.endocrineweb.com/conditions/type-2a-diabetes/type-2-diabetes-risk-factors.

  10. Atlas IDF. International Diabetes Federation 7th Edition, p. 2015. http://www.diabetesatlas.org.

  11. Worku D, Hamza L, Woldemichael K. Patterns of diabetic complications at jimma university specialized hospital, southwest ethiopia. Ethiop J Health Sci. 2010; 20(1):33–39.

    PubMed  PubMed Central  Google Scholar 

  12. SM A, Y B, A W, S A. Increasing trends of diabetes mellitus and body weight: A ten year observation at gondar university teaching referral hospital, northwest ethiopia. PLoS ONE. 2013; 8(3):60081.

    Article  Google Scholar 

  13. Thilakarathne PJ, Clement L, Lin D, Shkedy Z, Kasim A, Talloen W, Versele M, Verbeke G. The use of semiparametric mixed models to analyze pamchippeptide array data: an application to an oncology experiment. Bioinformatics. 2011; 27(20):2859–65.

  14. Szczesniak RD, Li D, Amin RS. Semiparametric mixed models for nested repeated measures applied to ambulatory blood pressure monitoring data. J Mod Appl Stat Methods. 2016; 15(1):255–75.

    Article  Google Scholar 

  15. Mehari Z. Application of Longitudinal Data Analysis on FBS of Adult Diabetes: Statistical Analysis of Fasting Blood Sugar Level of Adult Diabetic Patients in Jimma University Specialized Hospital. Saarbrücken, Germany: LAP Lambert Academic Publishing 2014; 2014.

    Google Scholar 

  16. Ruppert D, Wand M, Carroll R. Semiparametric regression. Cambridge, UK: cambridge University Press; 2003.

    Book  Google Scholar 

  17. Maringwa JT, Geys H, Shkedy Z, Faes C, Molenberghs G, Aerts M, Ammel KV, Teisman A, Bijnens L. Application of semiparametric mixed models and simultaneous confidence bands in a cardiovascular safety experiment with longitudinal data. J Biopharm Stat. 2008; 18(6):1043–62.

    Article  Google Scholar 

  18. Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. New York: Springer; 2000.

    Google Scholar 

  19. Stram D, Lee J. Variance components testing in the longitudinal mixed effects model. Biometrics. 1994; 50(3):1171–7.

    Article  CAS  Google Scholar 

  20. Scheipl F, Greven S, Küuchenhoff H. Size and power of tests for a zero random effect variance or polynomial regression in additive and linear mixed models. Comput Stat Data Anal. 2008; 52(7):3283–3299.

    Article  Google Scholar 

  21. Greven S, Crainiceanu C, Küuchenhoff H, Peters A. Restricted likelihood ratio testing for zero variance components in linear mixed models. J Comput Graph Stat. 2008; 17(4):870–91.

    Article  Google Scholar 

  22. Crainiceanu CM, Ruppert D. Likelihood ratio tests in linear mixed models with one variance component. J R Stat Soc Ser B (Statistical Methodology). 2004; 66(1):165–185.

    Article  Google Scholar 

  23. Wang Y. Smoothing spline models correlated random errors. J Am Stat Assoc. 1998; 93:341–8.

    Article  Google Scholar 

  24. Durbán M, Harezlak J, Wand M, Carroll R. Simple fitting of subject-specific curves for longitudinal data. Stat Med. 2005; 24(8):1153–67.

    Article  Google Scholar 

  25. Ni X, Zhang D, Zhang HH. Variable selection for semiparametric mixed models in longitudinal studies. Biometrics. 2010; 66(1):79–88.

    Article  Google Scholar 

  26. Wager C, Vaida F, Kauermann G. Model selection for penalized spline smoothing using akaike information criteria. Aust N Z J Stat. 2007; 49(2):173–90.

    Article  Google Scholar 

  27. Crainiceanu CM, Ruppert D, Claeskens G, MP W. Exact likelihood ratio tests for penalized splines. Biometrika. 2005; 92(1):91–103.

    Article  Google Scholar 

  28. Hastie T, Tibshirani R. Generalized Additive Models. New York: Wiley Online Library; 1990.

    Google Scholar 

  29. Wood S. Generalized Additive Models: an Introduction with R. Boca Raton, Florida: CRC press; 2006.

    Book  Google Scholar 

  30. Hastie T, Tibshirani R. Generalized additive models for medical research. Stat Methods Med Res. 1995; 4(3):187–96.

    Article  CAS  Google Scholar 

  31. Szczesniak RD, McPhail GL, Duan LL, Macaluso M, Amin RS, Clancy JP. A semiparametric approach to estimate rapid lung function decline in cystic fibrosis. Ann Epidemiol. 2013; 23(12):771–7.

    Article  Google Scholar 

  32. Szczesniak RD, Li D, Raouf AS. Semiparametric mixed models for medical monitoring data: An overview. J Biom Biostat. 2015; 6(2):234. https://doi.org/10.4172/2155-6180:1000234.

  33. Hardle W, Bowman AW. Bootstrapping in nonparametric regression: Local adaptive smoothing and confidence bands. J Am Stat Assoc. 1988; 83(12):102–110.

    Google Scholar 

  34. Khan HA, Sobki SH, Alhomida AS. Regression analysis for testing association between fasting blood sugar and glycated hemoglobin in diabetic patients. Biomed Res. 2015; 26(3):604–6.

    CAS  Google Scholar 

Download references

Acknowledgements

We thank the Diabetic Clinic of Jimma University Specialized Hospital, Ethiopia for giving access to the data and the staff members of the clinic for their support in extracting the information from patient’s medical card.

Funding

This study project was awarded funding from College of Natural Sciences Research and Postgraduate Office. The second author’s Jimma University visit, when this study started, was financially supported by the Africa Interaction, Knowledge, Interchange and Collaboration (KIC) funding (UID: 105298) from the National Research Foundation (NRF) and the University of South Africa under the Foreign general research grant.

Availability of data and materials

The data sets analyzed in this study available from the corresponding author on reasonable request.

Author information

Authors and Affiliations

Authors

Contributions

BBY and WKY contributed to the study concept and design. ZMN participated in the data collection and checked the data. TTA performed the statistical analyses and drafted the manuscript. LKD reviewed the findings of data analyses and was a major contributor in writing the manuscript. All authors approved the final manuscript.

Corresponding author

Correspondence to Tafere Tilahun Aniley.

Ethics declarations

Ethics approval and consent to participate

Ethical approval to conduct the study and human subject research approval for this study was received from Jimma University, College of Natural Sciences, Research Ethics Committee and the medical director of the Hospital. As the study was retrospective, informed consent was not obtained from the study participants, but data were anonymous and kept confidential.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aniley, T., Debusho, L., Nigusie, Z. et al. A semi-parametric mixed models for longitudinally measured fasting blood sugar level of adult diabetic patients. BMC Med Res Methodol 19, 13 (2019). https://doi.org/10.1186/s12874-018-0648-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12874-018-0648-x

Keywords