Use of partitioned GMM marginal regression model with time-dependent covariates: analysis of Chinese Longitudinal Healthy Longevity Study

Background Elderly population’s health is a major concern for most industrial nations. National health surveys provide a measure of the state of elderly health. One such survey is the Chinese Longitudinal Healthy Longevity Survey. It collects data on risk factors and outcomes on the elderly. We examine these longitudinal survey data to determine the changes in health and to identify risk factors as they impact health outcomes including the elderly’s ability to do a physical check. Methods We use a Partitioned GMM logistic regression model to identify risk factors. The model also accounts for the correlation between lagged time-dependent covariates and the outcomes. It addresses present and past measures of time-dependent covariates on simultaneous outcomes. The relation produces additional regression coefficients as byproduct of the Partitioned model, identifying the immediate, delayed effects (lag − 1), further delayed (lag-2), etc. Therefore, the model presents the opportunity for decision makers to monitor the covariate over time. This technique is particularly useful in healthcare and health related research. We use the Chinese Longitudinal Health Longevity Survey data to identify those risk factors and to display the utility of the model. Results We found that one’s ability to make own decisions, frequently consuming vegetables, exercise frequently, one’s ability to transfer without assistance, having visual difficulties and being able to pick book from floor while standing had varying effects of significance on one’s health and ability to complete physical checks as they get older. Conclusions The partitioning of the covariates as immediate effect, delayed effect or further delayed effect are important measures in a declining population.


Background
Longitudinal studies in medical-related research are useful in identifying changes in outcomes as impacted by certain risk factors. While the repeated measurements on subjects generate correlated observations, they are of different types of correlation. There is correlation among the responses. There is correlation between the timedependent covariates and the response. These correlations have different impacts on the outcomes. Thus, any models fitted to these data need to address these correlations accordingly.
Modelling time-dependent covariates when analyzing binary outcomes in longitudinal studies has drawn attention. There are methods due to Generalized Estimating Equations (GEE) and others based on Generalized Method of Moments (GMM) [1][2][3][4][5]. However, these methods do not separate out the impact of the timedependent covariates on the outcomes. In fact, they provide estimates that represent an average of the impacts. Obermeier et.al [6]. suggested that when modeling longitudinal data, one could not assume that the association between a time-dependent covariate and the outcome was only direct and simultaneous. This is because the outcome might depend on past measurements of the covariate. Thus, an alternative approach is to separate the different impacts of the covariate. Heagerty [7] indicated that one way to properly model longitudinal outcomes with time-dependent covariates is to include appropriate lagged values of such covariates. This approach requires additional regression coefficients for each segment of time-dependent covariate. These additional coefficients allow parsing of the effect of the covariate on the response, rather than assuming that the association maintains the same strength and direction over time. It provides insight into the effects of time-dependent covariates on present and future values of the outcomes.

Motivating example
Elderly population's health is a major concern for most industrial nations. National health surveys provide a measure of the state of elderly health. One such survey uses the Chinese Longitudinal Healthy Longevity Study (CLHLS) [8]. It collects data on risk factors and outcomes on the elderly population. The CLHLS was designed to identify key factors contributing to healthy longevity among elderly adults in China. The survey was conducted over time but we concentrated on four waves 2005, 2008, 2011 and 2014. This survey is of particular interest in China, as their annual growth rate of the elderly population is approximately 4.4% and approximately 20% of the world's oldest population live in China [8]. Gu, Zhang and Zeng [9] investigated the impact of adequate access to healthcare. Li, Zhang and Liang [10] used waves 1 & 2 to determine how living arrangements in 1998 impacted self-rated health in 2000. Zheng et.al [11]. studied the associations of environmental variables. Wu and Schimmele [12] tested how levels of psychological disposition in 1998 impacted self-rated health in 2000. Wang, Zheng, Kurosawa and Inaba [13] studied gender and age differences in health among elderly Chinese using data collected in 2002. However, in all of these studies only one or two waves of data were used and researchers were only able to determine crosssectional or lag-1 effects of time-dependent covariates on the outcomes.
In this paper, we made use of four waves to demonstrate the fit of Partitioned GMM for binary simultaneous outcomes, completion of a physical check and their health status. These responses were objectively measured by an interviewer. There are subjective measures but we concentrated on the objective measures. We focused our attention on the longitudinal aspect of the data and used all four waves. This increased number of waves used allows us to optimize the longitudinal nature of the data.

Data
The data consisted of elderly people 64 years and older living in 22 of 31 provinces in China. There were 8084 observations measured on 2021 individuals over the four waves. We fit models to interviewer-rated health and completion of a physical check that included the timeindependent covariate gender. These models also included the time-dependent covariates: able to make own decision, consumed vegetables frequently, exercised, transfer without assistance, visual difficulty and ability to pick up book from floor while standing. Descriptive statistics for the outcomes and time-dependent covariates are given in Tables 1 and 2, respectively. Our initial observation suggested a steady decline in the percentage of interviewees considered healthy over time, Table 1.

Methods
We fit a partitioned GMM logistic regression model [14] to the Chinese Longitudinal Healthy Longevity Study data to determine the effects of time-dependent covariates on the binary outcomes. The model measures the impact of time independent and time-dependent covariates X on the outcome Y measured at four different time points. Thus, there are some relations between X and Y other than cross sectional that must be addressed, Fig. 1. Thus, the partitioned GMM logistic regression model [14] provides coefficient estimates for the effect of X on Y when both are measured at the same time, for when X is measured one-time period ahead of Y, for when X is measured two-time periods ahead to Y and for when X is measured three-time periods ahead to Y.

Partitioned GMM logistic regression models with time dependent covariates
Let y it denote the binary observation for individual be a vector of J time-dependent covariates, where x ijt is the j th covariate observed at time t for individual i. Assume that observations y is and y kt are independent when i ≠ k but not necessarily when i = k and s ≠ t. The Partitioned GMM logistic regression model accounts for the relationships between the outcomes observed at time t, y i = (y i1 , .., y iT ) and the j th covariate observed at time s, x ijs for s ≤ t. For each subject i and each time-dependent covariate x ijt measured at times t = 1, 2, …, T; the data matrix is reconfigured as a lower triangular matrix, where the superscript denotes the difference, t − s in timeperiods between the response time t and the covariate time s. In this matrix, x while the model for all time periods in matrix form is The coefficient β tt j denotes the effect of the covariate x ijt on the response Y t when both are observed in the same time-period, while the vector of coefficient β F denotes the effect of the time-independent covariate x F on the response Y t . When s < t, we denote the lagged effect of the covariate x js on the response Y t by the coefficients β This method of estimating regression coefficients relies on valid moment conditions resulting from the covariate values at different times on the outcome at other times. The moment conditions are valid at cross-sectional measurements where covariates are measured at the same time as the outcome [2]. However, valid moment conditions between lagged covariates and the outcomes need to be tested. We do so through a test of bivariate correlation developed by Lalonde, Wilson and Yin [3]. Once the valid moments are identified, the regression parameters are estimated using a GMM approach [14]. We do not rehash the derivations here. We encourage the readers, who want to see that development to go to Lalonde, Wilson, and Yin [3], and Irimata, Broatch, and Wilson [14]. We fit these models through SAS 9.4 software using the %partitionedGMM macro (https://github. com/kirimata/Partitioned-GMM) [15]. It includes the test for valid moment conditions [3].
In our analysis of data in CLHLS, we fit two partitioned GMM logistic regression models to model interviewer-rated health and interviewees' ability to complete a physical check separately.  Table 3, Fig. 3. Further impacts were seen at lag-2 for transfer without assistance (OR = 4.30 with 95% CI : 1.78, 10.43). An additional delayed impact at lag-3 was seen for eating vegetables frequently (OR = 2.12 95% CI: 1.04, 4.33), Table 3, Fig. 3.

Discussion
The uniqueness of the partitioned GMM logistic regression models allows the immediate effect as well as future effects of time-dependent covariates on outcomes to be measured. Unlike the previous studies, researchers analyzed the CLHLS data but were only able to estimate cross-sectional or lag-1 effects of time-dependent covariates. However, we were able to determine both crosssectional and lag-1 associations as well as lag-2 and lag-3 relationships between the time-dependent covariates and our two binary outcomes, Table 4.  Figure 2 presents the relationships between the timedependent covariates and interviewer-rated health, over time. We found that gender and the ability to make one's own decision did not impact the probability of good health. Frequent consumption of vegetables increased good health immediately, but did not have any significant lagged effects. Exercising significantly increased the likelihood of being in good health immediately and in the next time period. The ability to transfer without assistance has a positive impact on good health immediately and in the next time period. Having visual challenges has an immediate negative impact on having good health. The ability to pick book from floor while standing has an immediate positive impact on good health.
Gender did not significantly impact the likelihood of completing a physical check. The ability to make one's own decisions has an immediate positive impact on completing a physical check. Consumption of vegetables frequently in the first wave significantly increased the likelihood of completing a physical check in the last wave. Exercising did not impact the completion of a physical check at any point in time. Ability to transfer without assistance significantly increases the likelihood  Being able to pick up a book from floor while standing increases the probability of completing a physical check. Figure 3 presents the changing relationships between the time-dependent covariates and the ability to complete a physical check.

Conclusions
Though we fitted the Partitioned GMM model to two binary outcomes, this model readily accommodates continuous outcomes. The partitioning of the data matrix with the use of additional coefficients provides an opportunity to measure the covariate on the responses at different periods.