A semi-parametric mixed models for longitudinally measured fasting blood sugar level of adult diabetic patients

Background At the diabetic clinic of Jimma University Specialized Hospital, health professionals provide regular follow-up to help people with diabetes live long and relatively healthy lives. Based on patient condition, they also provide interventions in the form of counselling to promote a healthy diet and physical activity and prescribing medicines. The main purpose of this study is to estimate the rate of change of fasting blood sugar (FBS) profile experienced by patients over time. The change may help to assess the effectiveness of interventions taken by the clinic to regulate FBS level, where rates of change close to zero over time may indicate the interventions are good regulating the level. Methods In the analysis of longitudinal data, the mean profile is often estimated by parametric linear mixed effects model. However, the individual and mean profile plots of FBS level for diabetic patients are nonlinear and imposing parametric models may be too restrictive and yield unsatisfactory results. We propose a semi-parametric mixed model, in particular using spline smoothing to efficiently analyze a longitudinal measured fasting blood sugar level of adult diabetic patients accounting for correlation between observations through random effects. Results The semi-parametric mixed models had better fit than the linear mixed models for various variance structures of subject-specific random effects. The study revealed that the rate of change in FBS level in diabetic patients, due to the clinic interventions, does not continue as a steady pace but changes with time and weight of patients. Conclusions The proposed method can help a physician in clinical monitoring of diabetic patients and to assess the effect of intervention packages, such as healthy diet, physical activity and prescribed medicines, because individualized curve may be obtained to follow patient-specific FBS level trends.

is ineffective. The type 2 diabetes is characterized by high levels of blood sugar or glucose resulting from defects in insulin production, insulin action, or both. The gestational diabetes is a form of diabetes that appears during pregnancy. It can lead to serious health risks for both the mother and child [7]. The risk factors that are associated with type 1 diabetes include family history of diabetes (diabetes history in one parent or both), infections and other environmental influences such as exposure to a viral illness, the presence of damaging immune system cells, i.e. autoantibodies and dietary factors low vitamin D consumption [8]. Whereas, for type 2 diabetes the risk factors are excess body weight, physical inactivity, poor nutrition, family history of diabetes, past history of gestational diabetes and older age [9]. The risk factors for increase or decrease in fasting blood sugar level of a patient include overweight, family history of diabetes, age, type of diabetes, blood pressure and gender [7]. The focus of this study however is on type 1 and type 2 diabetes.
In year 2015, there were an estimated 415 million adults aged 20-79 years living with diabetes worldwide [10], including 193 million who are undiagnosed. There were approximately 5 million people estimated to have died from diabetes worldwide in the same year, and a majority of these were the result of cardiovascular complications. In Africa Region, the number of adults living with diabetes estimated at 14.2 million whereas in Ethiopia the number is estimated 1 to 10 million in year 2015. The Region has the highest proportion of undiagnosed diabetes, 9.5 million (about 66.7%) of people with diabetes are unaware they have the disease and in Ethiopia there are 500 thousand to 5 million such cases [11,12].
At the diabetic clinic of Jimma University Specialized Hospital (JUSH), health professionals provide regular follow-up to help people with diabetes live long and relatively healthy lives. Depending on patients conditions, e.g. FBS level, they also provide interventions in the form of counselling to promote a healthy diet and physical activity and prescribing medicines.
The main objective of the current study is to assess the factors that affect the FBS level of adult diabetic patients. In addition to assessing the factors that affect the FBS level over time, we are also interested to estimate the rate of change of FBS profile experienced by patients over time. The change may help to assess the effectiveness of interventions taken by the clinic to regulate FBS level, where rates of change close to zero over time may indicate the interventions are good regulating the level. These changes are determined using first derivatives of penalized regression splines [13,14].
The FBS level data of diabetic patients in this study are collected repeatedly over time hence the data have longitudinal time series profiles and the data also have continuous nature. For statistical inferences, therefore, it is necessary to capture properly the form of the evolution of profiles over time. In the analysis of longitudinal data, the mean profile is often estimated by parametric linear mixed effects model, for instance recently Mehari [15] analyzed the FBS level profiles of diabetic patients using parametric linear mixed effects model. However, the individual and mean profile plots of FBS level for diabetic patients (see Fig. 1) are nonlinear and imposing parametric models may be too restrictive and yield unsatisfactory results. In the present paper, we propose a semi-parametric mixed model in particular using spline smoothing [16,17] to efficiently analyze a longitudinal measured fasting blood sugar level of adult diabetic patients accounting for correlation between observations through random effects. The model assumes that the mean of FBS level is an arbitrary smooth function of time and parametric functions of other covariates. The link between mixed model and smoothing provides a flexible framework for estimating the patient profiles in a data driven way [13]. The rest of the paper is organized as follows. The data, some basic review of variance-covariance structure of the parametric linear mixed model, semi-parametric mixed models and inferences related them are introduced in "Methodology" section. The results from applying these methods on the the study data are discussed in "Results" section. Finally discussion, and conclusions and pointers for future study are given in "Discussion" and "Conclusion" sections respectively.

Study data
The fasting blood sugar (FBS) level data used in this paper arises from a retrospective study conducted in Jimma University Specialized Hospital (JUSH) diabetic clinic. The hospital is located in Jimma town 352 km to the Southwest of Addis Ababa, the capital of Ethiopia. It is a teaching hospital and gives service to the southwestern part of Oromia region, some part of southern nations and nationalities and Gamella regions of Ethiopia. All diabetic patients aged 18 years or older, who were coming to JUSH diabetic clinic for their regular follow up during periods September 2011 and June 2014 were eligible for this study. During their follow up, patients FBS level along with other characteristics such as weight are measured and recorded in the individual follow up chart. The data in the chart include time (measured in months, where baseline or initial date was given a value 0), patient gender, age, type of diabetes ( Type 1 diabetes or Type 2 diabetes) and family diabetes history. The duration between initial and the last recorded visits ranged from one to 36 months. Patients with at least two observations were included in the analyses leading to a total of 534 patients and 4390 observations. Permission of the study was obtained from Postgraduate research office of Jimma University, College of Natural Sciences and JUSH.

Variance-covariance structures
The FBS level data of this study fall within the framework of continuous longitudinal data and hence can be modeled by use of a parametric linear mixed model. Let Y ij denote the FBS level of the ith patient observed at time t ij , i = 1, . . . , n and j = 1, . . . m i . The parametric linear mixed model can be expressed as That is, the population level mean response is modeled as a polynomial function of time, t ij , a linear function of covariates x ijl , l = 1, . . . , p where some of them may be time-varying covariates or interaction effects each has corresponding regression parameter coefficient θ l , a function of subject-specific random coefficient terms and measurement error ε ij . The coefficients β k , k = 1, . . . , p and θ l , l = 1, . . . , L are fixed effect parameters and b u i , u = 0, . . . , q are subject-specific random coefficients.
We have examined models for p = 2 which represents quadratic polynomial and b u i with u = 0, 1, 2 represent a subjectspecific random intercept, slope and quadratic coefficients, respectively for selection of a variance-covariance structure (see Table 1). The variance profile plot of FBS level shows (for the sake of brevity this plot is not reported) the variance changes overtime, therefore to allow for more flexibility to estimate between subject variability we have considered the above three variancecovariance structures.
In Table 1, for instance the subject-specific random intercept b 0 i in the quadratic random effects model (M 3 ) is considered to capture correlation of the FBS level measurements over time within the patient and it is assumed that subject-specific random slopes for linear as well as for quadratic time effects to capture different evolution of FBS level over time. Note that these subject-specific random structures are different for each patient.

Tests for zero variance components
Adequate variance-covariance structure is essential to obtain valid model based inferences for the fixed effects or for parameters in the mean structure of the model [18]. Over-parametrization of the variance-covariance structure leads to inefficient estimation and potentially poor assessment of standard errors for the estimation of the mean structure, i.e. fixed effects, whereas a too restrictive specification invalidates inferences about the mean response profile when the assumed structure does not hold. The likelihood ratio test for testing, for example H 0 : , if the vector of FBS level can be divided into a large number of independent and identically distributed subvectors both under H 0 and H 1 . However, this assumption usually does not hold, for example in linear mixed models or for unbalanced data [20][21][22]. Note that the FBS level data are unbalanced in the sense that all patients do not have equal number of measurements, hence the independent and identically distributed assumption can be violated in the linear mixed models used in this paper. Therefore, we used the exact finite sample null distribution of the restricted likelihood ratio test (RLRT) statistic derived by Crainiceanu and Ruppert [22] to test a zero random effect variance in M 1 . However, since models M 2 and M 3 contain more than one random effect, the tests for a zero random effect variance in these models were done using the exact finite sample null distribution of the RLRT statistic derived by Greven et al. [21].

Semi-parametric mixed effects model
Given the mean profile plots over time in Fig. 1b, imposing parametric functions to describe the mean FBS level evolution may not be easy and also too restrictive [17]. As an alternative, we can model the mean profiles over time with a semi-parametric smooth function, f (t ij ). Using the pth degree truncated power basis, f (t ij ) can be written as here z + = max{0, z}. The function f t ij is a combination of fixed effects parameters β 0 , β 1 , . . . , β p and pth degree splines evaluated at time t ij with knots at distinct locations κ 1 , κ 2 , . . . , κ K in the range of t ij and corresponding coefficients b 1 , b 2 , . . . , b K . The function f t ij can be estimated among others, with penalized splines. The coefficients of spline basis functions b l are assumed to follow a Gaussian distribution such that b l ∼ N 0, Then, incorporating f t ij in model (1), the general semi-parametric mixed effects model can be expressed as

Estimation of parameters
Let y i = y i1 , y i2 , . . . , y im i be the m i × 1 vector of responses for the ith patient, i = 1, . . . , n. Under the linear mixed model formulation, model (3) with subject-specific quadratic random effects can be expressed succinctly in matrix form as , v, u i and e i are assumed to be pairwise independent with and between subjects for i = 1, 2, . . . , n. Note that G and R i are 3 × 3 and m i × m i variance-covariance matrices, respectively.
The overall model for n individuals has the form Estimation of the coefficients of penalized and unpenalized terms in model (4) was done using a penalized iteratively reweighted least squares (P-IRLS) based on 20 equidistant knots in the range of FBS level and a smoothing parameter selection was done by REML [23].
The correspondence between the penalized spline smoother and the optimal predictor in a mixed model framework enables us to take advantage of the existing methodology for mixed model analysis and the use of mixed model software, such as the function gamm in mgcv R package, for fitting the penalized spline model and the MIXED and GLIMMIX procedures in SAS [24]. This implementation of penalized smoothing in the linear mixed model framework also provides an automated approach to obtain a smoothing parameter and flexibility to extend the models [17].
In this paper, parameters in the fitted models are estimated by restricted maximum likelihood (REML) method because the statistical hypotheses that were considered have the same mean structures between models under the null and alternative hypotheses. Furthermore, maximum likelihood estimators of variance components are biased downward as they do not take into account the degrees of freedom lost in the estimation of fixed effects (e.g. see Ruppert et al. [16]).

Model selection and inference
The model building process of this work includes selection of suitable variance-covariance structure for random effects, testing whether the inclusion of spline effects in the parametric model improves model fit or not and also selection of covariates. The linear mixed model framework provides a unified approach to do all these [25]. In the parametric cases, the best fitting model can be selected by employing a commonly used selection criteria, Akaike's Information Criterion (AIC) and Bayesian Information Criterion (BIC) or by a likelihood ratio test. However, since the semi-parametric mixed models that we considered here are differ in both the fixed effects and the nonparametric part, model selection is done via adjusted Akaike's information criterion, abbreviated AIC adj , using the effective number of parameters in the model [16,26]. Let C = [ X Z f ] be the design matrix with appropriate fixed effects components and the corresponding smoothing matrix, B = 0 0 0 G −1 where G is the variance-covariance matrix of random effects used in the model and R = diag{R 1 , R 2 , . . . , R n }, i.e. R is the block diagonal variance-covariance matrix of error terms with blocks R i on the main diagonal and zeros elsewhere. Then the effective number of parameters and AIC adj may be computed as and AIC adj = −2 log(Lik) + 2 E p , respectively. Unlike the marginal AIC which penalizes only for the number of parameters in fixed effects vector and variance components, the penalty of AIC adj takes into account the additional parameters introduced into a model via f t ij or smoothing by including the design matrix Z f in C [17]. Like the marginal AIC, the smaller the AIC adj value, the better the model. Testing whether the inclusion of spline effects in the parametric model improves model fit or not is equivalent to testing H 0 : σ 2 b = 0 versus H 1 : σ 2 b > 0. In this paper, due to the second objective of the study, a quadratic penalized spline was added in Eq. (1), therefore neither of the two methods discussed in "Variance-covariance structures and inference" section can be used to test H 0 : σ 2 b = 0 [27] instead an approximate F-test of Hastie and Tibshi [28] was applied. For Hastie and Tibshi approximate F-test, residual degrees of freedom for the null and alternative model fits are used in the place of the number of parameters in each model.

Rate of change over time and simultaneous confidence bands
The change in smoothing function f (t) overtime, for selected semi-parametric mixed model, can be estimated by taking the derivative of f (t) with respect to time t. For example, let f (t) be a quadratic penalized spline, that is Taking the first derivative with respect to time t yields An estimate of f (t), denotedf (t), is obtained by substituting the quadratic fit parameter estimatesβ 1 ,β 2 , and b 1 ,b 2 , . . . ,b K . However, the construction of simultaneous confidence bands requires the variance-covariance matrix for the vector of contrasts between the estimated and true parameters for the fixed and random effects. Let C = [ X Z f ] be a design matrix containing quadratic time effects and a truncated quadratic basis, B is a matrix constructed from variance components corresponding to smoothing, i.e. Var(v) in model (4). Then, a variance-covariance matrix for the vector of contrasts is given by Ruppert et al. [16], where R is the block diagonal variance-covariance matrix of error terms defined in "Semi-parametric mixed effects model" section. Let g = (g 1 , g 2 , . . . , g T ) be a grid of equally spaced time points. Definê where C g is C with design matrices X and Z f are evaluated over g. Assuming the vector of contrasts have approximately multivariate distribution with mean vector 0 and variance-covariance matrix C R −1 C + B −1 [16,29], i.e.
a 100 (1 − α)% simultaneous confidence bands for f g is given bŷ where The quantile h (1−α) can be approximated using simulations. First we simulate from realization of (5) and computation of (7) can be repeated for a large number of times, say N times, to obtainh 1 The proposed semi-parametric mixed models were fitted with the the gamm function available in R package mgcv [29] and the linear mixed models using the lme function available in R package nlme.

Patients baseline characteristics
A total of 534 adult diabetic patients were in the study, of which 342 (64.04%) were male, 399 (74.72%) were Type 2 diabetic patients and 417 (78.09%) didn't have family history of diabetes. The patients mean (SD) age at the first visit (or baseline) was 45.40 (14.62) years and ranges between 18 and 93 years, weight was 62.83 (13.36) kgs and FBS level was 164.72 (86.20) mg/L. There were significant differences of these means between Type 1 and Type 2 diabetic groups ( Table 2).The results in Table 2 also show that, at baseline there was a significant association between family history of diabetes and type of diabetes (p-value <0.0001). However, the association between patient gender and type of diabetes was nonsignificant (p-value = 0.9935). The median (first quartile -third quartile) time between first and last clinic visits of patients was 15.25 (7.25 -24.75) months and ranged from as few as 0.5 month between visits to as much as 6 months between visits.

Parametric mixed models Mean structure
The main interest of this study is to apply semi-parametric mixed models, however for comparison purpose here we start the analysis by fitting parametric mixed models. Scatter plot smoothing was used to examine changes in FBS level over time and also to asses the interactions of each categorical covariate with time [30,31]. The smoothing plots suggest the changes in FBS can be described by quadratic trend. Furthermore, due to the non-crisscrossing of trends representing Type 1 and Type 2 diabetes groups, and with family history and no family history groups Type × time and Family history × time were not included in the mixed models. However, the trend representing male and female crossing at one time point. Therefore, we start with very general model that includes time (in quadratic form), other fixed effects and the necessary interactions, that is where Type and F.History represent diabetes type and family history of diabetes, respectively.

Variance-covariance structure for random effects
The above mean structure fitted with subject-specific random intercepts, linear random time effects and quadratic random time effects. For each of the models, the independent error structure is assumed and the results are given in Table 3. The fixed effect estimates were consistent in sign but have slight differences in magnitude across the three different variance-covariance structures. The variables age, gender, diabetes type, family history, and time by weight and gender by time interactions were statistically nonsignificant in all models, except for time by weight interaction where its p-value marginally significant for subject-specific random intercept and slope model (i.e. linear random effects model). The covariates that were statistically significant at 5% level, i.e. Time, Time 2 and weight and the time by weight interaction were retained for the subsequent analysis.
The Crainiceanu and Ruppert [22] RLRT statistic for testing H 0 : σ 2 b 0 = 0 against H 1 : σ 2 b 0 > 0 in model M 1 takes the value RLRT = 738.24 with p-value < 0.0001. The large value of the test statistic or a very small p-value strongly suggests a rejection of the null hypothesis (i.e. H 0 : σ 2 b 0 = 0) that no subject-specific random effects should be included in the model. Similar tests were conducted using the exact finite sample null distribution of the RLRT statistic of Greven et al. [21] to test H 0 : The analysis results for model M 4 are presented in Table 4. Except the time × weight interaction effect, which is marginally non-significant at 5% level, all the fixed effects are highly significant.

Semi-parametric mixed model
The observed mean FBS level profile of patients is shown in Fig. 1b. The plot in this figure shows that the linearity assumption is not reasonable. Therefore, the analysis had to account for the longitudinal data structure and the observed nonlinearity of FBS level estimated with smooth effects in the mixed model framework. Given our specific interest in rate of change in FBS level due to clinic interventions, its functional form (over time) can affect the rate of change. Since the rate of change involves taking derivatives of the smooth function f (t ij ), we choose to use quadratic penalized spline to model the FBS level mean response [31]. Following results from the previous section, we propose the following semi-parametric mixed model with linear random effects structure, called M 5 Using appropriately constructed matrices this model can be represented using a matrix notation of "Tests for zero variance components" section. This model is fitted using the random intercept and linear random  Table 5.
The results in Table 5 show that the fixed effects estimates were consistent in sign but have slight difference in magnitude in both semi-parametric and parametric mixed models (see Table 4), except for the effect of time where both the sign and magnitude of its coefficient estimates were different in the two models and the effect of "time square" was nonsignificant in the semi-parametric mixed models. Further, the interaction of weight with time was not statistically significant in any of the semi-parametric mixed model. Except for the subject-specific random slope variance component, there is a slight decrease in subject-specific random intercept and residual variance components in the semi-parametric model compared to variance components in the linear mixed model M 4 (see Table 4).
To compare the two variance structures under the semi-parametric mixed model given in Eq. (10), we computed AIC, BIC and adjusted AIC (see Table 6). Adjusted AIC shows that the semi-parametric mixed model with subject-specific intercepts as well as slopes (or random linear effects) value is smaller than that of the random intercept. Therefore, the semi-parametric model with random linear effects is the preferred model.

Model selection
In this section we are focusing on assessing whether the inclusion of spline effects improves model fit compared to parametric counterpart. This is equivalent to testing H 0 : The approximate F-test statistic for testing the above hypotheses, i.e. quadratic form of f (t ij ) against a quadratic penalized splines, is 83.63 with p-value < 0.0001. This strongly suggests a rejection of the null hypothesis H 0 : σ 2 b = 0. Thus, the shape of the function f (t ij ) is statistically different from a quadratic trend.
Furthermore, consider the semi-parametric mixed model M 5 in Eq. (10) with random linear effects variancecovariance structure and the linear mixed model M 4 in Eq. 9. The fit statistics from fitting these two models are displayed in Table 6. The −2 log(Lik), AIC and BIC values indicate a substantial improvement in the fit of M 5 compared to M 4 , implying model with penalized spline representation of FBS level was preferred over its parametric counterpart.
The overall results show that, out of the models evaluated, FBS level of diabetes patients at the JUSH diabetic clinic during the study period best characterized by a

Simultaneous confidence band
The first derivative of mean response function, i.e.f (.), with respect to time was estimated for the final model, M 6 holding weight constant. The rate of change in mean response of FBS level then investigated using the 95% simultaneous confidence bands for the model. The confidence bands were constructed following the discussion in "Estimation of parameters" section. A gride g of time points (0, 35) were defined by increments of one month such that there are T = 36 equally spaced time points. The resulting simultaneous confidence bands displayed in Fig. 2 where the solid line and shaded region represent, respectively, the mean predicted FBS level and the confidence bands. Visual inspection indicates that on average diabetes patients were able to decrease or control their FBS level, due to JUSH clinic interventions, in the first five months period after their initial visit. However, after month 5, the slope of the curve starts changing it signs, this might imply that patients do actually not follow-up the intervention packages properly or not come to the clinic for treatment due to some unknown reason. The confidence bands become noticeably wider after 27 months of follow-up period, demonstrating the increased variability. This increase may be due to a smaller number of FBS level recordings being observed at the later period of the study or a potential artifact induced by the spline smoothing [32]. In practice spline smoothing creates a challenge in semi-parametric regression settings through the inherent bias from using truncated basis functions. The confidence bands obtained for FBS level does not account for this function bias. However, this bias could be corrected, e.g. using bootstrapping methods [33].

Discussion
This study focused on longitudinal data analysis of fasting sugar level of adult diabetic patients at Jimma University Specialized Hospital diabetic clinic using an application of semi-parametric mixed model. The study revealed that the rate of change in FBS level in diabetic patients, due to the clinic interventions, does not continue as a steady pace but changes with time and weight of patients. Furthermore, it clarified the associations between FBS level and some characteristics of adult diabetic patients that weight of a diabetes patient has a significant negative effect whereas patient gender, age, type of diabetes and family history of diabetes did not have a significant effect on the change of FBS level. The result on gender agrees with the findings of [34] where the gender effect on fasting blood glucose level of type 2 diabetes was statistically nonsignificant. Under the two variance-covariance structures of subject-specific random effects, the semi-parametric mixed models had better fit than their parametric counterparts. This was likely due to the localized splines which captured more variability in FBS level than the linear mixed models. The methodology used in the analysis has implications for clinical monitoring in regular followup of diabetic patients and to assess the effect of intervention packages, such as healthy diet, physical activity and prescribed medicines, because individualized curve may be obtained to follow patient-specific FBS level trends [31].
The main limitation of the study is the limited information on important predictors such as type of interventions including treatment types and nutritional status of a patient that may have influenced the rate of change in FBS level. Due to lack of data on these potential predictors for most of the patients involved in the study, we were unable to include them in the analyses. Therefore, more public health and epidemiology researches are needed to examine the impact of treatments and interventions on population health in general and in particular, people living with diabetes to avoid its complications over time and to identify new risk factors for diabetes.

Conclusion
In this paper, we demonstrate the use of semiparametric mixed effect model for estimation of the rate of change of fasting blood sugar (FBS) level experienced by patients over time. The proposed method can help a physician in clinical monitoring of diabetic patients and to assess the effect of intervention packages such as healthy diet, physical activity.