- Research article
- Open Access
- Open Peer Review
Integrated analysis of incidence, progression, regression and disappearance probabilities
- Guan-Hua Huang^{1}Email author
https://doi.org/10.1186/1471-2288-8-40
© Huang; licensee BioMed Central Ltd. 2008
- Received: 05 November 2007
- Accepted: 25 June 2008
- Published: 25 June 2008
Abstract
Background
Age-related maculopathy (ARM) is a leading cause of vision loss in people aged 65 or older. ARM is distinctive in that it is a disease which can transition through incidence, progression, regression and disappearance. The purpose of this study is to develop methodologies for studying the relationship of risk factors with different transition probabilities.
Methods
Our framework for studying this relationship includes two different analytical approaches. In the first approach, one can define, model and estimate the relationship between each transition probability and risk factors separately. This approach is similar to constraining a population to a certain disease status at the baseline, and then analyzing the probability of the constrained population to develop a different status. While this approach is intuitive, one risks losing available information while at the same time running into the problem of insufficient sample size. The second approach specifies a transition model for analyzing such a disease. This model provides the conditional probability of a current disease status based upon a previous status, and can therefore jointly analyze all transition probabilities. Throughout the paper, an analysis to determine the birth cohort effect on ARM is used as an illustration.
Results and conclusion
This study has found parallel separate and joint analyses to be more enlightening than any analysis in isolation. By implementing both approaches, one can obtain more reliable and more efficient results.
Keywords
- Birth Cohort
- Transition Model
- Generalize Estimate Equation
- Association Model
- Proportional Odds Model
Background
The present paper was motivated by an earlier population-based longitudinal study of age-related ocular disorders. Here, we focus on age-related maculopathy (ARM), a leading cause of vision loss in the elderly. ARM is characterized by the distinctive "transition" property: once the incident occurs, the disease can progress, regress, and disappear. This transition characteristic is also exhibited by several other diseases [1–3]. Traditional statistical methods provide information on the risk of "having a disease" (prevalence). The analysis of the transition course of ARM poses a challenge. The purpose of our study is to develop a methodology for studying the relationship between risk factors and an individual's disease transition, including incidence, progression, regression and disappearance.
If we classify a change in the severity of the disease by defining a three-level scale: disease-free, early and late stage, then different transition courses can be defined as the current disease level conditioning upon the level at the immediately preceding examination. Incidence of the disease implies the appearance of the disease at the current examination when it was absent at the preceding examination. Progression implies that an individual is initially diagnosed with an early stage of the disease with worsening at the current examination, while regression implies the presence of the disease at the preceding examination with an improvement at the current examination. Disappearance implies the presence of the disease at the preceding examination and its absence at the current examination. Because of the nature of the definition, an obvious way to analyze the data is to constrain the study population to individuals with a specific disease level at the initial examination. We can then analyze the probability of the constrained population developing a different level at follow-up. The choice of the disease level will then depend on the type of transition we are interested in, and each type of transition can be analyzed separately. For example, when studying progression, we will include only those individuals that are classified as being in the early stage in the initial exam in our analysis. We then study the probability of developing a late stage of the disease at follow-up.
While this approach is intuitive, we risk losing some of our available information. For example, let's look at a study in which each participant is measured at the baseline and at 5-year and 10-year follow-up examinations. A disease must be present at the 5-year follow-up for progression to be possible at the 10-year follow-up, therefore, the incidence of a disease at the 5-year examination and its progression at the 10-year examination are correlated. By separating incidence and progression, we waste the valuable correlation between two transitions. We may also encounter the difficulty of an insufficient sample size. For the "rare" disease where only a small number of cases are observed, the study population for progression, regression and disappearance probabilities will be small. A model with many covariates of interest may not converge due to an insufficient sample size.
An alternative approach is based on a transition model. The model assumes that there is a correlation among repeated measurements because the past values explicitly influence the present observation. It formulates the conditional distribution of each measurement as a function of past observations and relevant risk factors. The transition model provides the conditional probability of a current disease level based upon its previous level. This is one way we can define the incidence, progression, regression and disappearance probabilities. By joint analysis, this approach takes the correlations among various transition probabilities into account and allows some confounding variables to have an equal effect on various transition probabilities, which in turn can ease the problem of insufficient sample size described above. However, these benefits come at the price of stronger modelling assumptions.
The remainder of this paper is organized as follows. In the methods section, we first briefly describe the research project that motivated this study and define the distinct transition probabilities of ARM. Next, we summarize the approach for analyzing the transition probabilities separately, and then we introduce a transition model to analyze them jointly. In addition we discuss parameter interpretation and estimation. Finally, we show how separate and joint analyses can be used together to obtain more reliable and efficient results. The results section applies our methodology to analyze the birth cohort effect on different transition probabilities of ARM, and we discuss the possible generalization of the proposed model.
Methods
The Beaver Dam Eye Study
The Beaver Dam Eye Study, a longitudinal cohort study of residents of Beaver Dam, Wisconsin between the ages of 43 and 84 years in 1987–1988, has been described in detail elsewhere [4–6]. This study aims to determine the long-term course of common vision-threatening conditions in adult Americans. The 4,926 individuals that participated in the baseline examination in 1988–1990, decreased to 3,684 at the 5-year follow-up in 1993–1995 due to death, relocation or refusal, then decreased to 2,764 at the 10-year follow-up in 1998–2000, and then further decreased to 2,119 at the 15-year follow-up in 2003–2005. Drop-outs were older and less educated than those who participated in the follow-up examinations. There were no other statistically significant differences while controlling for age [5, 6].
ARM severity scale and transition probabilities
Definitions of distinct ARM transition probabilities
probabilities/time | baseline | 5-year | 10-year | 15-year |
---|---|---|---|---|
prevalence | Pr(ARM(0) = 1,2) | Pr(ARM(5) = 1,2) | Pr(ARM(10) = 1,2) | Pr(ARM(15) = 1,2) |
incidence | N/A | Pr(ARM(5) = 1,2|ARM(0) = 0) | Pr(ARM(10) = 1,2|ARM(5) = 0) | Pr(ARM(15) = 1,2|ARM(10) = 0) |
progression | N/A | Pr(ARM(5) = 2|ARM(0) = 1) | Pr(ARM(10) = 2|ARM(5) = 1) | Pr(ARM(15) = 2|ARM(10) = 1) |
regression | N/A | Pr(ARM(5) = 0,1|ARM(0) = 2) | Pr(ARM(10) = 0,1|ARM(5) = 2) | Pr(ARM(15) = 0,1|ARM(10) = 2) |
disappearance | N/A | Pr(ARM(5) = 0|ARM(0) = 1) | Pr(ARM(10) = 0|ARM(5) = 1) | Pr(ARM(15) = 0|ARM(10) = 1) |
Analyzing transition probabilities separately
This paper presents two different ways for analyzing the transition courses of ARM. We specifically want to draw inferences of the relationship between risk factors and patients' incidence, progression, regression and disappearance probabilities. The first approach is to define different probabilities based on the definitions provided in the previous subsection and analyze each probability separately.
Formally, let O _{ ij }be the disease severity scale of the ith individual at the jth examination (i = 1, ⋯ , N; j = 1, ⋯ ,J). In our application, (O _{ i1}, O _{ i2}, O _{ i3}, O _{ i4}) represents the collection of the combined 3-level severity scales of ARM for the ith individual at baseline, 5-year follow-up, 10-year follow-up and 15-year follow-up.
It should be noted that for each transition course, there are J - 1 indicators from the same individual and, therefore, these indicators are correlated.
cov(Inc_{ ij }, Inc_{ ik }) = f(μ _{ ij }, μ _{ ik }; α), j <k, (2)
where μ _{ ij }= Pr(Inc_{ ij }= 1) and f(·) is a known function. Each transition probability is analyzed separately.
Parameter and standard error estimations can be obtained by the generalized estimating equations (GEE) approach [10, 11]. It is worthwhile to point out that, by the definition of the indicator of each transition type, individuals whose indicators are equal to 1 at time j will have missing values at time j + 1. When estimating the correlation between two adjacent time points, only those individuals whose indicators are equal to 0 at time j are included in the analysis and, therefore, we assume that the correlation among individuals who have indicator values equaling to 1 at time j is similar to those who have value 0. Here, we are most interested in inferences of β's in the marginal mean. GEE approach can guarantee the consistency of $\widehat{\beta}$'s even if the above equal-correlation-assumption is incorrect [9].
Analyzing probabilities jointly: the transition model
Model
A transition model specifies a generalized linear model for the conditional distribution of the current disease status, given the past responses. To obtain the desired transition probabilities, the transition model used in this study specifies the conditional distribution given on the immediately preceding response.
where j = 2, ⋯ , J; c = 0, 1; o _{ i(j-1) }is the realization of O _{ i(j-1)}; and I(o _{ i(j-1) }= k) = 1 if o _{ i(j-1) }= k and 0 otherwise, for k = 1, 2.
Some key features of the proposed transition model are as follows. First, because the disease severity scale O _{ ij }is an ordinal scale, we model the cumulative probability (O _{ ij }> c) similar to the proportional odds model [12], rather than the category probability (O _{ ij }= c). Second, our model allows the regression coefficients γ's and β's to be different for different c. We also add the interactions between the preceding response (I(o _{ i(j-1) }= 1), I(o _{ i(j-1) }= 2)) and the risk factors of interest x _{ ij1}, ⋯ ,x _{ ijP }. These modelling approaches allow the risk factor effects varying with c and the disease level at examination j - 1. Because different transition probabilities can be obtained by selecting a different c and a different disease level at examination j - 1, model (3) enables us to investigate the risk factor effects for different transition probabilities. Third, the proposed model has the potential to grow quickly given the possible cutpoints c and interactions. To efficiently apply the model, regression coefficients for covariates that are not of major interest and serve as confounding effects may be assumed to be independent of c or as having no interactions with the previous disease status.
Parameter interpretation
Through the transition model (3), we can derive the relationship of the incorporated risk factors with different transition probabilities. When c = 0 and (I(o _{ i(j-1) }= 1), I(o _{ i(j-1) }= 2)) = (0, 0), the conditional probability Pr(O _{ ij }> c|o _{ i(j-1)}) = Pr(O _{ ij }= 1 or 2|o _{ i(j-1) }= 0), which represents the incidence probability.
Therefore,
β _{ p0 }= log odds ratio of the disease incidence for every one unit increase in x _{ ijp }. (4)
When c = 1 and (I(o _{ i(j-1) }= 1), I(o _{ i(j-1) }= 2)) = (1, 0), the conditional probability becomes the progression probability, thus,
(β _{ p1 }+ τ _{1p }) = log odds ratio of the disease progression for every one unit increase in x _{ ijp }. (5)
When c = 1 and (I(o _{ i(j-1) }= 1), I(o _{ i(j-1) }= 2)) = (0, 1), we then have the conditional probability equal to one minus the regression probability, thus,
-(β _{ p1 }+ τ _{2p }) = log odds ratio of the disease regression for every one unit increase in x _{ ijp }. (6)
When c = 0 and (I(o _{ i(j-1) }= 1), I(o _{ i(j-1) }= 2)) = (1, 0), the conditional probability is equal to one minus the disappearance probability, thus,
-(β _{ p0 }+ τ _{1p }) = log odds ratio of the disease disappearance for every one unit increase in x _{ ijp }. (7)
Statistical inference
If the first-order Markov assumption (i.e., O _{ ij }is assumed to depend on the past responses only through the immediately preceding response) is correct, the conditional distribution Pr(O _{ ij }|H _{ ij }) = Pr(O _{ ij }|O _{ i(j-1)}).
Since the transition events {O _{ ij }|O _{ i(j-1)}; j = 2, ⋯ ,J} are uncorrelated, standard algorithms for fitting the proportional odds models can be used by adding (I(o _{ i(j-1) }= 1), I(o _{ i(j-1) }= 2)) and their interactions with (x _{ ij1}, ⋯ , x _{ ijP }) as additional covariates.
where j <k = 2, ⋯ , J and c_{1}, c_{2} = 0, 1. The odds ratio between two repeated measurements is assumed to depend on the measurement at time 1. This assumption may be checked and modified, if necessary. The association model may be simplified as an intercept only model or by imposing additional covariates to the model. If none of α 0, α 1 and α 2 are significant, the first-order Markov assumption is appropriate, and we thus recommend to use the standard proportional odds model for inferences to avoid unnecessary complication.
Analysts may choose from three different GEE estimating methods to estimate the parameters in equations (3) and (10) when implementing Heagerty and Zeger's model. First-order GEE (GEE1 – [10]) treats the parameters in the association model (10) as nuisance and is focused primarily on obtaining the parameters in the marginal mean model (3). Second-order GEE (GEE2 – [14]) estimates the parameters in both (3) and (10) jointly. Extended alternating logistic regressions (ALR – [15]) replaces the estimating equation in GEE1 for the parameters in (10) by an unbiased nonlinear estimating equation and offers high efficiency in the estimation of both sets of parameters. The standard errors of all three methods are calculated using robust "sandwich" variance estimators. GEE2 estimates the association parameters in (10) most precisely; however, it has the disadvantages that the consistency of the parameters in (3) depends on having specified the correct model for the association model, and that its computational burden quickly grows to infeasibility as data clusters become large. Thus in situations where inference regarding the parameters in the marginal mean model (3) is primary or when estimation using GEE2 is intractable, GEE1 or ALR may be most appropriate.
It should be noted that the proportional odds model and Heagerty and Zeger's model both make the proportional odds assumption. That is to say, they assume the regression coefficients to be independent of cutpoints c. The transition model (3) is more complicated, since the model allows γ's and β's to be different for different c. To relax the proportional odds assumption, one can first expand the original input data set for the ordinal outcomes O _{ ij }into a new data set for cumulative probability variables (I(O _{ ij }> 0), I(O _{ ij }> 1)) plus cutpoint identifiers (I(c = 0), I(c = 1)), and then add interactions between the cutpoint identifiers and the covariates. Details for using SAS to implement the "partial" proportional odds model can be found in Chapter 15 of the book by Stokes et al. [16]. For fitting Heagerty and Zeger's model with cutpoint-varied regression coefficients, readers can refer to the article by Huang et al. [17].
Evaluating equal covariate effects across transition probabilities
The separate analysis allows different covariate effects on different transition probabilities, however, it also risks losing available information and encountering an insufficient sample size. The joint analysis "borrows strength" in part by assuming equality with respect to some confounding effects on transition probabilities, and in certain cases, this may be inappropriate. This section presents an approach for the empirical examination of the equal-confounding-effect assumption, utilizing separate analytical results. Then, the joint transition model can be modified accordingly in order to reduce the complexity of the model.
Their variance estimators cannot be derived easily because they involve estimations of the covariances between estimators from different models. We propose to estimate the distributions of $({\tilde{\beta}}_{p1}-{\tilde{\beta}}_{p0}),{\tilde{\tau}}_{1p}$ and ${\tilde{\tau}}_{2p}$ using the bootstrap method [18]. It must be noted that in order to perform bootstrapping for repeated measures on each individual, each subject is sampled with replacement rather than individual observations.
does not cover 0, where ${({\tilde{\beta}}_{p1}-{\tilde{\beta}}_{p0})}_{\alpha /2}^{\ast}$ is the lower 100(α/2)th percentile of the bootstrap replications of statistics $({\tilde{\beta}}_{p1}-{\tilde{\beta}}_{p0})$.
In the case where there are many confounders to be tested for the equal-effect assumption, we recommend that each potential confounder is considered separately. In other words, perform bootstrapping for the separate analysis with major risk factors plus one confounder at a time to determine the modelling of this confounder in the transition model.
Three null hypotheses H _{01}, H _{02} and H _{03} should be checked separately. If only part of the three null hypotheses are rejected, this means that the covariate effects on various transition probabilities are similar to some extent, and that only corresponding interactions are added. For example, if only H _{02} : τ _{1p }= 0 is rejected, the interaction I(o _{ i(j-1) }= 1)x _{ ijp }is included.
The proposed procedure for checking the equal-confounding-effect assumption is "empirical", compared with the backward elimination starting at the "full" transition model (i.e., all risk factor effects varying with c and the disease level of the previous examination). However, the full transition model is usually too complicated to converge, making the backward elimination procedure not feasible.
Results
The analysis we report here aims to examine whether a birth cohort effect is observed for ARM. The birth cohort effect is defined as the variation in developing ARM that arises from the different exposures to each birth cohort. Thus, if a birth cohort effect exists, individuals from different birth cohorts would have different chances of developing ARM, even if they are of the same age. The birth cohort effect on the prevalence of ARM has been investigated elsewhere [19]. Here, we focus on the birth cohort effect on different transition probabilities
Analytical methods
To graphically display the observed birth cohort patterns, we first aggregated the data into a two-way table by birth year and age group in 5-year intervals, and calculated different transition probabilities of ARM in each cell. Next, we plotted the transition probability against age for each birth cohort. For our application, 9 birth cohorts and 10 age groups were constructed (birth cohorts: ≤1907, 1908–1912, 1913–1917, 1918–1922, 1923–1927, 1928–1932, 1933–1937, 1938–1942, ≥1943; age groups: ≤49, 50–54, 55–59, 60–64, 65–69, 70–74, 75–79, 80–84, 85–89, ≥90).
var(Inc_{ ij }) = μ _{ ij }(1 - μ _{ ij }) and corr(Inc_{ ij }, Inc_{ ik }) = α _{0}, (12)
where μ _{ ij }= Pr(Inc_{ ij }= 1), j <k = 2: 5-year follow-up; 3: 10-year follow-up; 4: 15-year follow-up, (age in 1987)_{ i }is the ith participant's age in 1987, age_{ ij }is the age of participant i at examination j, and (confounders) _{ ij }represents characteristics that could potentially influence the relationship among ARM, birth cohort and age at the examination, including gender, smoking status, history of heavy drinking, multi-vitamin use, cholesterol level and hypertension status [19] (the boldface type denotes multiple factors). Treatment of ARM is not included as a confounding variable because, at present, there are few medical interventions that have been shown to prevent the incidence or progression of ARM [20, 21]. Although surgical intervention in some cases prevents further loss of vision, it usually does not restore vision in the patient. In our Beaver Dam Eye study, no significant relationships were found between the most commonly used interventions and 5-year and 10-year incidences of early or late ARM [20, 21]. The concomitant low frequency of use of medication, surgery, and of incidence of early and late ARM limits our ability to detect any meaningful relationship.
The birth cohort effect exp(5β _{1}) is the odds ratio of ARM incidence for every 5-year decrease in birth year (5-year older birth cohort) among people with the same age. The age effect exp(5β _{2}) is the odds ratio for every 5-year increase in age, comparing people from the same birth cohort. These two effects are adjusted for the identified confounding effects. Here, we chose the "exchangeable" working correlation because the focus was on the birth cohort effect and a reasonable and simple association model (12) was all we needed. The indicator Inc_{ ij }was replaced by Pro_{ ij }, Reg_{ ij }or Dis_{ ij }when analyzing different transition courses.
Before conducting the joint analysis, we evaluated the equal-effect hypotheses H _{01}, H _{02} and H _{03} on each of the identified confounding variables in order to reduce the complexity of the model. If the 80% bootstrap percentile confidence interval (with 500 bootstrap replicates) covered 0, the corresponding hypothesis was accepted and the modelling of the confounding variable in the transition model (3) was modified accordingly.
where c = 0, 1, j = 2, 3, 4 and the function g(·) depends on the significance of hypotheses H _{01}, H _{02} and H _{03} for each of the identified confounding variables. We added (10) as the association model and fit a Heagerty and Zeger's model with cutpoint-varied regression coefficients. Because our focus was not on the degree of association among the transition events {O _{ ij }|O _{ i(j-1)}; j = 2, ⋯ , J _{ i }}, we used GEE1 as the estimating method, which is robust to the misspecification of the association model (10). The birth cohort effects of ARM incidence, progression, regression and disappearance are exp(5β _{10}), exp{5(β _{11} + τ _{ 11 })}, exp{-5(β _{11} + τ _{21})} and exp{-5(β _{10} + τ _{11})}, respectively. The age effects are exp(5β _{20}), exp{5(β _{21} + τ _{12})}, exp{-5(β _{21} + τ _{22})} and exp{-5(β _{20} + τ _{12})} for ARM incidence, progression, regression and disappearance, respectively.
Results
Bootstrap percentile confidence intervals
confounding variables/hypotheses | H _{01} : β _{ p1 }= β _{ p0} | H _{02} : τ _{1p }= 0 | H _{03} : τ _{2p }= 0 |
---|---|---|---|
male gender | (-0.61, 0.56) | (-0.40, 0.44) | (-0.61, 0.68) |
pack years smoked | (-0.013, 0.0087) | (-0.0065, 0.0096) | (-0.0096,0.014) |
past heavy drinker | (-0.87, 0.71) | (-0.50, 0.57) | (-0.70, 0.94) |
current heavy drinker | (-78.32, 1.54) | (-1.17, 40.37) | (-1.62, 78.67) |
past vitamin user | (-0.71, 0.71) | (-0.56, 0.52) | (-0.88, 0.75) |
current vitamin user | (-0.65, 0.71) | (-0.45, 0.49) | (-0.78, 0.68) |
total cholesterol | (-0.0059, 0.0053) | (-0.0048, 0.0046) | (-0.0063, 0.0065) |
hypertensive | (-0.51, 0.50) | (-0.39, 0.39) | (-0.54, 0.55) |
g((confounders) _{ ij }) = β _{3} × (confounders) _{ ij }.
It should be noted that the bootstrap confidence interval for "current heavy drinker" is very wide, compared to other variables. This is caused by the large standard error of its regression coefficient estimate in modelling the disappearance probabilities. Only 0.9% of current drinkers had experienced the disappearance events. We performed a separate analysis for disappearance with and without "current heavy drinker" and obtained results that were similar for other variables in the model. To be comparable with our previous results, we decided to keep "current heavy drinker" in the model.
The fitted lines of transition probabilities over age by birth cohort based on the separate analysis (11, 12) are shown in the panels of the second row of Figure 2. The fitted lines were obtained by smoothing the estimated probabilities of the transition event versus the age for each birth cohort. The third row of Figure 2 represents the fitted transition probabilities based on the transition model (13, 14). Model (10) was first used as the association model, but because both α _{1} and α _{2} were not significant, we simplified the association model as
log{OR[I(O _{ ij }> c _{1}), I(O _{ ik }> c _{2})|O _{ i1}]} = α _{0},
and obtained ${\widehat{\alpha}}_{0}=-0.97$ (95% CI: -1.48, -0.46). For all four transition probabilities, the results from the two approaches were pretty close and they fit the data equally well.
To evaluate the impact of the first-order Markov assumption on the joint analysis, we had fit a standard proportional odds model to models (13, 14). Results can be found from Additional files 1 and 2. In summary, approaches with and without the first-order Markov assumption provided consistent parameter estimates, but this Markov assumption resulted in much wider CI's for birth cohort and age effects. These reflected the robustness of the regression coefficients in (3) for the misspecification of the association model (10) and the power gained from an appropriate association model.
Discussion
We select Reg_{ ij }and Dis_{ ij }for two reasons. First, they are the direct result of the transition model (3). The proposed transition model models (I(O _{ ij }> 0), I(O _{ ij }> 1)) (cumulative probabilities of the current response) and (I(o _{ i(j-1) }= 1), I(o _{ i(j-1) }= 2)) (level indicators of the preceding response). This modelling can result in the incidence and progression that meet our desired definitions, but not those of regression and disappearance. Since our motivational example was more interested in incidence and progression than in the other two courses, we thus adopted the above modelling. Second, the selected regression and disappearance are very close to the desired ${\text{Reg}}_{ij}^{\ast}$ and ${\text{Dis}}_{ij}^{\ast}$ in our ARM application. Because late ARM was rare (Figure 1), Dis_{ ij }was close to ${\text{Dis}}_{ij}^{\ast}$ Also, none of the people with late ARM became disease free in the follow-up, and Dis_{ ij }was equal to ${\text{Dis}}_{ij}^{\ast}$.
To obtain the inference for ${\text{Dis}}_{ij}^{\ast}$, one can replace the level indicators of the preceding response with cumulative probabilities (I(o _{ i(j-1) }> 0), I(o _{ i(j-1) }> 1)) in model (3) and set c = 0 and (I(o _{ i(j-1) }> 0), I(o _{ i(j-1) }> 1)) = (1, 1). If the regression ${\text{Reg}}_{ij}^{\ast}$ is of interest, then we can use the indicators of the current response (I(Oij = 1), I(O _{ ij }= 2)) as dependent variables and fit a linear generalized logit model [22], setting c = 1 and (I(o _{ i(j-1) }= 1), I(o _{ i(j-1) }= 2)) = (0, 1). Analysts can select modelling strategies for current and past responses based on interested transition probabilities, then modify the definitions of secondary transition probabilities accordingly, the same as we did for the ARM birth cohort study. Or, one could fit several different transition models with different modelling selections and draw inferences for interested transition probabilities from corresponding models.
This paper considered two different approaches for analyzing longitudinal disease staging data. In the separate analysis, the incidence, progression, regression and disappearance probabilities are marginally defined, modelled and estimated. One can easily modify the definition of a transition probability to accommodate various needs (e.g., using ${\text{Reg}}_{ij}^{\ast}$ and ${\text{Dis}}_{ij}^{\ast}$ for analysis). The separate analysis also allows different covariate effects on different transition probabilities, which is best for carefully describing specific precursor effects on transition probabilities and provides an excellent reference for checking the assumptions on which the transition model relies. In contrast, a joint transition model can borrow strength from all transition probabilities. For confounding variables that do not show different effects on different transition probabilities through the examination of separate analytical results, the transition model can adopt the equal-effect assumption to reduce the complexity of the model. One limitation is its inflexibility in simultaneously obtaining desirably defined transition probabilities as described in the above discussion. As a general strategic recommendation: It is natural to first analyze each transition probability separately for initial findings and empirical examination of the equal-confounding-effect assumption. Then, the transition model, taking separate analytical results into account, is useful to refine and clarify those outcomes that are indecisive in separate analysis.
The transition model (3) can potentially grow very large, with increasing number of levels, covariates and follow-ups. To ensure a large enough sample size for implementing the model, one can examine the cross tabulations of O _{ ij }versus O _{ i(j-1) }for j = 2, ⋯ , J, stratifying by possible values of major risk factors. It is recommended that no cell value should be less than 5.
There are many possible generalizations of the proposed framework. Generalization to allow a disease severity scale with more than three levels can be easily done. However, with more than three disease-severity levels the definitions of distinct transition probabilities are not trivial, thus researchers may need to first define the transition probabilities according to the study aims and then work on the modelling of current and past responses to meet those aims. Also, the proposed approaches may be generalized to allow subjects to be measured at different sets of times (i.e., unequally-spaced follow-up). The transition model (3) solely depends on the immediately preceding response and, by treating the correlation as nuisance, the association model (10) is taken to handle the inter-correlation among the transition events {O _{ ij }|O _{ i(j-1)}; j = 2, ⋯ , J _{ i }}. Thus, the model does not result in different interpretations of regression coefficients in (3) for subjects with different numbers of examinations, as discussed in [8]. In the case where additional subjects can be recruited at any time points during the study (i.e., an open population), these newly recruited samples will have missing disease severity observations at time points before their recruitment. If their missingness is completely at random [23], then the situation can be handled by only including collected examinations and their associated covariates.
Conclusion
This paper proposed and demonstrated a framework for studying the relationship of disease incidence, progression, regression and regression with risk factors of interest. Our proposed framework includes two different analytical approaches. One approach can define, model and estimate the relationship between each transition probability and risk factors separately. The other approach specifies a transition/conditional probability model to formulate the probability of the current disease level based upon the previous level. It studies the disease as a whole and uses the whole population to estimate these probabilities together. We recommend that one first analyzes each transition probability separately for data exploration and assumption evaluation, and then utilize the transition model to refine and clarify the results. The results of the ARM data analysis show that the parallel application of separate and joint analyses is superior over any in isolation. In this regard, mutually cohesive findings generally will comprise stronger scientific evidence than those supported by only one of the analytical approaches. The fitting methods for the transition model are readily implementable in available software.
Declarations
Acknowledgements
The Beaver Dam Eye Study was supported by the National Institutes of Health grants EYO6594. The author wishes to thank Drs. Ronald Klein and Barbara E. K. Klein for kindly making the Beaver Dam Eye Study data available. The author (GHH) was also partially supported by grants from the National Science Council of Taiwan and the Program for Promoting Academic Excellence of Universities in the Ministry of Education of Taiwan (MOE-ATU).
Authors’ Affiliations
References
- Byer NE: Subclinical retinal detachment resulting from asymptomatic retinal breaks- prognosis for progression and regression. Ophthalmology. 2001, 108: 1499-1504. 10.1016/S0161-6420(01)00652-2.View ArticlePubMedGoogle Scholar
- Petrakis , Sciacca V, Iascone C: Diagnosis and treatment of Barrett's oesophagus. A general survey. Acta Chir Belg. 2001, 101: 53-58.PubMedGoogle Scholar
- Lamm DL, Blumenstein BA, Crawford ED, Montie JE, Scardino P, Grossman HB, Stanisic TH, Smith JA, Sullivan J, Sarosdy MF, Crissman JD, Coltmaan CA: A randomized trial of intravesical doxorubicin and immunotherapy with bacille calmette-guerin for transitional-cell carcinoma of the bladder. N Engl J Med. 1991, 325: 1205-1209.View ArticlePubMedGoogle Scholar
- Klein R, Klein BEK, Linton KLP, DeMets DL: The Beaver Dam Eye Study: visual acuity. Ophthalmology. 1991, 98: 1310-1315.View ArticlePubMedGoogle Scholar
- Klein R, Klein BEK, Lee KE, Cruickshanks KJ, Gangnon RE: Changes in visual acuity in a population over a 15-year period: the Beaver Dam Eye Study. Am J Ophthalmol. 2006, 142: 539-549. 10.1016/j.ajo.2006.06.015.View ArticlePubMedGoogle Scholar
- Klein R, Klein BEK, Kundtson MD, Meuer SM, Swift M, Gangnon RE: Fifteen-year cumulative incidence of age-related macular degeneration: the Beaver Dam Eye Study. Ophthalmology. 2007, 114: 253-262. 10.1016/j.ophtha.2006.10.040.View ArticlePubMedGoogle Scholar
- Klein R, Klein BEK, Wong TY, Tomany SC, Cruickshanks KJ: The association of cataract and cataract surgery with the long-term incidence of age-related maculopathy. Arch Ophthalmol. 2002, 120: 1551-1558.View ArticlePubMedGoogle Scholar
- Liang KY, Zeger SL: Regression analysis for correlated data. Annu Rev Public Health. 1993, 14: 43-68. 10.1146/annurev.pu.14.050193.000355.View ArticlePubMedGoogle Scholar
- Zeger SL, Liang KY: An overview of methods for the analysis of longitudinal data. Stat Med. 1992, 11: 1825-1839. 10.1002/sim.4780111406.View ArticlePubMedGoogle Scholar
- Liang KY, Zeger SL: Longitudinal data and analysis using generalized linear models. Biometrika. 1986, 73: 13-22. 10.1093/biomet/73.1.13.View ArticleGoogle Scholar
- Heagerty PJ, Zeger SL: Marginal regression models for clustered ordinal measurements. J Am Stat Assoc. 1996, 91: 1024-1036. 10.2307/2291722.View ArticleGoogle Scholar
- McCullagh P: Regression models for ordinal data. J R Stat Soc Ser B. 1980, 42: 109-142.Google Scholar
- Diggle PJ, Heagerty P, Liang KY, Zeger SL: Analysis of Longitudinal Data. 2002, New York, NY: Oxford Uiversity Press, SecondGoogle Scholar
- Prentice RL, Zhao LP: Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses. Biometrics. 1991, 47: 825-839. 10.2307/2532642.View ArticlePubMedGoogle Scholar
- Carey VJ, Zeger SL, Diggle P: Modelling multivariate binary data with logistic regressions. Biometrika. 1993, 80: 517-526. 10.1093/biomet/80.3.517.View ArticleGoogle Scholar
- Stokes ME, Davis CS, Koch GG: Categorical Data Analysis Using the SAS System. 2000, Cary, NC: SAS Publishing, SecondGoogle Scholar
- Huang GH, Bandeen-Roche K, Rubin GS: Building marginal models for multiple ordinal measurements. J R Stat Soc Ser C Appl Stat. 2002, 51: 37-57. 10.1111/1467-9876.04739.View ArticleGoogle Scholar
- Efron B, Tibshirani R: An Introduction to the Bootstrap. 1993, New York, NY: Chapman and HallView ArticleGoogle Scholar
- Huang GH, Klein R, Klein BEK, Tomany SC: Birth cohort effect on prevalence of age-related maculopathy in the Beaver Dam Eye Study. Am J Epidemiol. 2003, 157: 721-729. 10.1093/aje/kwg011.View ArticlePubMedGoogle Scholar
- Klein R, Klein BEK, Jensen SC, Cruickshanks KJ, Lee KE, Danforth LG, Tomany SC: Medication use and the 5-Year incidence of early age-related maculopathy: the Beaver Dam Eye Study. Arch Ophthalmol. 2001, 119: 1354-1359.View ArticlePubMedGoogle Scholar
- Klein R, Klein BEK, Tomany SC, Moss SE: Ten-year incidence of age-related maculopathy and smoking and drinking: the Beaver Dam Eye Study. Am J Epidemiol. 2002, 156: 589-598. 10.1093/aje/kwf092.View ArticlePubMedGoogle Scholar
- Agresti A: Analysis of Categorical Data. 1984, New York, NY: Wiley and SonsGoogle Scholar
- Little RJA, Rubin DB: Statistical Analysis with Missing Data. 1987, New York, NY: Wiley and SonsGoogle Scholar
- The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/8/40/prepub
Pre-publication history
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.