We fit a partitioned GMM logistic regression model [14] to the Chinese Longitudinal Healthy Longevity Study data to determine the effects of time-dependent covariates on the binary outcomes. The model measures the impact of time independent and time-dependent covariates X on the outcome Y measured at four different time points. Thus, there are some relations between X and Y other than cross sectional that must be addressed, Fig. 1. Thus, the partitioned GMM logistic regression model [14] provides coefficient estimates for the effect of X on Y when both are measured at the same time, for when X is measured one-time period ahead of Y, for when X is measured two-time periods ahead to Y and for when X is measured three-time periods ahead to Y.
Partitioned GMM logistic regression models with time dependent covariates
Let yit denote the binary observation for individual i (i = 1, …, N) at time t (t = 1, …, T). Let xit = (xi1t, …, xiJt) be a vector of J time-dependent covariates, where xijt is the jth covariate observed at time t for individual i. Assume that observations yis and ykt are independent when i ≠ k but not necessarily when i = k and s ≠ t. The Partitioned GMM logistic regression model accounts for the relationships between the outcomes observed at time t, yi = (yi1, .., yiT) and the jth covariate observed at time s, xijs for s ≤ t. For each subject i and each time-dependent covariate xijt measured at times t = 1, 2, …, T; the data matrix is reconfigured as a lower triangular matrix,
$$ {\boldsymbol{X}}_{ij}=\left[\begin{array}{c}{x}_{ij1}\\ {}{x}_{ij2}\\ {}\begin{array}{c}\vdots \\ {}{x}_{ij T}\end{array}\end{array}\ \begin{array}{ccc}0& \dots & 0\\ {}{x}_{ij1}& \dots & 0\\ {}\begin{array}{c}\vdots \\ {}{x}_{ij\left(T-1\right)}\end{array}& \begin{array}{c}\vdots \\ {}\dots \end{array}& \begin{array}{c}\vdots \\ {}{x}_{ij1}\end{array}\end{array}\right]=\left[{\boldsymbol{x}}_{ij}^{\left[0\right]}\kern0.5em {\boldsymbol{x}}_{ij}^{\left[1\right]}\kern0.5em \dots \kern0.5em {\boldsymbol{x}}_{ij}^{\left[T-1\right]}\right] $$
where the superscript denotes the difference, t − s in time-periods between the response time t and the covariate time s. In this matrix, \( {\boldsymbol{x}}_{ij}^{\left[0\right]} \) contains values of the time-dependent covariate observed at the same time as the outcome, \( {\boldsymbol{x}}_{ij}^{\left[1\right]} \) includes values of the time-dependent covariate observed one-time period prior to outcomes, and so on such that \( {\boldsymbol{x}}_{ij}^{\left[T-1\right]} \) consists of the values of the covariate measured T − 1 time periods prior to outcome. Thus, the model for the outcome at time t with one time-independent covariate and one time-dependent covariate is
$$ logit\left({\mu}_{it}\right)={\beta}_0+{\beta}_F{x}_F+{\beta}_j^{tt}{x}_{ij t}+{\beta}_j^{\left[1\right]}{x}_{ij\left(t-1\right)}+{\beta}_j^{\left[2\right]}{x}_{ij\left(t-2\right)}+\dots +{\beta}_j^{\left[t-1\right]}{x}_{ij1} $$
(1)
while the model for all time periods in matrix form is
$$ logit\left[\begin{array}{c}{\mu}_{i1}\\ {}{\mu}_{i2}\\ {}\begin{array}{c}\vdots \\ {}{\mu}_{iT}\end{array}\end{array}\right]={\beta}_0\left[\begin{array}{c}1\\ {}1\\ {}\begin{array}{c}\vdots \\ {}1\end{array}\end{array}\right]+{\beta}_F{x}_F\left[\begin{array}{c}1\\ {}1\\ {}\begin{array}{c}\vdots \\ {}1\end{array}\end{array}\right]+{\beta}_j^{tt}{\boldsymbol{x}}_{ij}^{\left[0\right]}+{\beta}_j^{\left[1\right]}{\boldsymbol{x}}_{ij}^{\left[1\right]}+{\beta}_j^{\left[2\right]}{\boldsymbol{x}}_{ij}^{\left[2\right]}+\dots +{\beta}_j^{\left[T-1\right]}{\boldsymbol{x}}_{ij}^{\left[T-1\right]} $$
The coefficient \( {\beta}_j^{tt} \) denotes the effect of the covariate xijt on the response Yt when both are observed in the same time-period, while the vector of coefficient βF denotes the effect of the time-independent covariate xF on the response Yt. When s < t, we denote the lagged effect of the covariate xjs on the response Yt by the coefficients \( {\beta}_j^{\left[1\right]},{\beta}_j^{\left[2\right]},\dots, {\beta}_j^{\left[T-1\right]} \). In general, each of the J time-dependent covariates yield a maximum of T partitions of βj. Thus, for a model with J covariates, the data matrix X has a maximum dimension of NT by (J × T) + 1, and β is a vector of maximum length (J × T) + 1.
This method of estimating regression coefficients relies on valid moment conditions resulting from the covariate values at different times on the outcome at other times. The moment conditions are valid at cross-sectional measurements where covariates are measured at the same time as the outcome [2]. However, valid moment conditions between lagged covariates and the outcomes need to be tested. We do so through a test of bivariate correlation developed by Lalonde, Wilson and Yin [3]. Once the valid moments are identified, the regression parameters are estimated using a GMM approach [14]. We do not rehash the derivations here. We encourage the readers, who want to see that development to go to Lalonde, Wilson, and Yin [3], and Irimata, Broatch, and Wilson [14]. We fit these models through SAS 9.4 software using the %partitionedGMM macro (https://github.com/kirimata/Partitioned-GMM) [15]. It includes the test for valid moment conditions [3].
In our analysis of data in CLHLS, we fit two partitioned GMM logistic regression models to model interviewer-rated health and interviewees’ ability to complete a physical check separately.