In recent years, one of the hot topics in glaucoma research has been the effect of IOP fluctuation on POAG. Although more and more studies have confirmed that a decrease in the mean IOP level can reduce the risk of developing POAG, the findings from major prospective clinical trials about the impact of IOP fluctuation on POAG remain controversial [25, 27–30]. In this paper, we analyzed the post-randomization IOPs from OHTS and EGPS taking a latent class analysis (LCA) approach. The LCA allows us to identify distinct patterns of IOP change over time and then associates the changes in IOP with the risk of POAG. The results from both studies showed that different patterns of IOP change could markedly affect the risk of POAG (irrespective of their baseline, pre-randomization IOP levels). In OHTS, the change in IOP was best described by 6 distinct patterns. The model identified a subset of participants in whom IOP variability also played an important role in predicting POAG. This subgroup showed the highest IOP variability and had a higher risk than those with a comparable IOP mean. Comparing to the reference class, these participants were less likely from treatment group (OR = 0.11), more likely self-classified as being black (OR = 2.12), and had relatively higher baseline IOP (OR = 2.80). However, the subgroup only accounted for about 10% of the OHTS sample, and this may partially explain our finding that IOP variability was an independent risk factor in the OHTS but had little impact on the overall predictive accuracy for POAG (manuscript in progression). In a sensitivity analysis using the non-Black participants, the LCA identified similar patterns of IOP change as in the whole OHTS dataset. This result was consistent with a tree-based model in the OHTS-EGPS meta-analysis which showed that race was no longer an important predictor for POAG development after considering other risk factors [17]. In EGPS, LCA identified 5 distinct latent classes and confirmed that those subjects with the highest mean IOP were most likely to develop POAG. However, it failed to disentangle the effect of fluctuation from mean because these participants with the highest mean level also had the largest IOP variability. Interestingly, despite the marked differences between EGPS and OHTS in the treatment intervention and magnitude of IOP lowering achieved, both studies showed that adding IOP change into the baseline model improved the overall predictive accuracy for POAG development.

Conventionally the change of longitudinal data is described using linear mixed models with random coefficients [31]. Though the mixed model recognizes the heterogeneous nature of the data by allowing each individual to have his/her own intercept and slope, it assumes that all individuals come from a single population and uses an average trajectory for the entire population. A LCA analyzes data from a rather different perspective. The model approximates the unknown heterogeneity in the distribution of longitudinal outcome using a finite number of polynomial functions each describing a unique subpopulation [14, 32]. It classifies individuals into distinct groups based on the patterns of longitudinal outcome, so that individual within a group are more similar than those between different groups. This LCA possesses some unique advantages as comparing to conventional methods. First, the model lends itself directly to a set of well characterized subpopulations and also provides a formal statistical procedure to determine the appropriate number of subpopulations. It thus enables the discovery of unexpected yet potentially meaningful subpopulations that may be otherwise missed with conventional methods. Second, the method permits one to relate the developmental patterns of longitudinal data to its antecedents (predictors or covariates) and consequences (clinical outcomes), and thus allows estimation of both direct and indirect (via longitudinal data) effects of a covariate on the distant outcome [16, 23]. Finally, the recent advances of the dual trajectory modeling also allow investigators to assess the joint evolution of multiple longitudinal processes, which may evolve contemporaneously or over different time periods [32].

LCA also provides an attractive alternative for making prediction with time-dependent covariates [21, 22]. A LCA takes a joint modeling approach to assess the association between longitudinal and survival data and thus uses information more efficiently, resulting less biased estimates. Unlike the conventional joint models that assess the association via shared random effects [19, 33, 34], a LCA relates the longitudinal process to survival process by latent classes and assumes the two stochastic processes independent given the class membership [22]. Therefore, neither time-dependent covariates nor random effects of the longitudinal data are needed in the survival sub-model. Such a model specification will avoid the intensive computation to obtain the random effects for new subjects and hence facilitates a real-time individualized prediction [21]. The key to build an accurate prediction in a LCA setting is to have a reliable classification given the observed data. Generally speaking, the more the available serial biomarker readings, the more reliable a classification is. To this consideration, the impact of follow-up IOP on POAG may be over-estimated in OHTS because an average length of 6.5-year IOP readings was used to calculate the 5-year POAG-free rate. To solve this dilemma, which is rather common in all predictions involving time-dependent covariates, one of the most frequently used approaches in medical literature is a landmark analysis that consists of fitting a serial of survival models only to the subjects still at risk, that is, computation of the predictive distribution at a certain time given the history of event and covariates until that moment [35]. In a LCA setting, such a dynamic prediction can be conveniently implemented because the conditional survival probability at any time can be calculated analytically from a single LCA once the parameters are estimated [21].

Despite its advantages, the LCA has several limitations. First, the computational load of LCA can be high, especially for models with complexity structures. In OHTS data (N = 1600), for example, it ran less than 10 minutes for an unconditional 6-class LCA, but it took more than 30 minutes to develop the full conditional model. Because of the exploratory nature of data analysis with LCA, the cumulative time can be substantial. For this consideration, in practice the best LCA model is often constructed taking a two-step approach as in this paper. Another issue in LCA is that the log-likelihood function may end up at local rather than global maxima. Fortunately this issue has been taken into consideration by the statistical package Mplus which automatically uses 10 sets of randomly generated starting values for estimation. The program also allows investigators to rerun and compare the estimates from user specified starting values if necessary [23].