 Research
 Open access
 Published:
Modeling of atrophy size trajectories: variable transformation, prediction and ageofonset estimation
BMC Medical Research Methodology volume 21, Article number: 170 (2021)
Abstract
Background
To model the progression of geographic atrophy (GA) in patients with agerelated macular degeneration (AMD) by building a suitable statistical regression model for GA size measurements obtained from fundus autofluorescence imaging.
Methods
Based on theoretical considerations, we develop a linear mixedeffects model for GA size progression that incorporates covariabledependent enlargement rates as well as correlations between longitudinally collected GA size measurements. To capture nonlinear progression in a flexible way, we systematically assess BoxCox transformations with different transformation parameters λ. Model evaluation is performed on data collected for two longitudinal, prospective multicenter cohort studies on GA size progression.
Results
A transformation parameter of λ=0.45 yielded the best model fit regarding the Akaike information criterion (AIC). When hypertension and hypercholesterolemia were included as risk factors in the model, they showed an association with progression of GA size. The mean estimated ageofonset in this model was 67.21±6.49 years.
Conclusions
We provide a comprehensive framework for modeling the course of uni or bilateral GA size progression in longitudinal observational studies. Specifically, the model allows for ageofonset estimation, identification of risk factors and prediction of future GA size. A squareroot transformation of atrophy size is recommended before model fitting.
Background
Agerelated macular degeneration (AMD) is a leading cause of blindness, especially for people in developed countries older than 60 years [1, 2]. AMD has two late stages: choroidal neovascularization (CNV) and geographic atrophy (GA). Here we consider GA, which is thought to be the end stage of AMD when CNV does not develop [3] and which is responsible for vision loss in approximately 20% of all patients with AMD [4]. More than five million people are estimated to be affected by GA worldwide, a number which is supposed to increase with the aging of the population [2]. To date, there is no effective standard treatment available [5].
GA is defined by atrophic lesions of the outer retina resulting from loss of retinal pigment epithelium (RPE), photoreceptors and underlying choriocapillaris (reviewed by [6]). These areas enlarge with time and lead to irreversible loss of visual function [7]. A relevant clinical measure of disease progression is the eyespecific size of GA which can be quantified based on imaging techniques including color fundus photography, spectral domain optical coherence tomography imaging, or fundus autofluorescence (FAF) imaging [8, 9].
A better understanding of the risk factors that accelerate GA size progression is necessary for the development of treatment options, in particular for the design of (interventional) clinical trials. To date, empirical evidence on GA size progression is usually collected through longitudinal observational studies (e.g. [10–12]). In these studies, it is essential to analyze GA size trajectories over time using an adequate statistical model. Specifically, in the absence of a randomized study design, data analysis needs to account for confounding issues as well as correlation patterns, for instance when both eyes of a patient are included in the study. In the latter case, the correlations between the eyes within one patient need to be incorporated as well as the correlations due to repeated measurements over time.
The aim of this analysis is to systematically derive a statistical approach for modeling GA size in observational ophthalmologic studies. As will be demonstrated in the following sections, the proposed approach generalizes various statistical models for GA size progression that have been used in previous publications (see below). Special focus will be given to the following issues, which are considered to be of particular importance for the planning and design of future interventional trials:
(i) Transformation of GA size. Before model fitting, it is important to consider whether the response (here, GA size) should be transformed. Finding an appropriate transformation can provide information about the underlying natural processes that drive the progression of GA. In recent publications on GA size progression, there has been an ongoing discussion about the optimal choice of transformation [11, 13–15]. Three main modeling paradigms have emerged: The first set of models assumes a linear relationship between GA size and covariables (e.g. risk factors or confounding variables). This implies a constant enlargement of GA size over time. Examples of this modeling approach can be found in [13, 14]. The second approach assumes a quadratic enlargement of the lesion size. This is motivated by the thought of circular atrophic lesions that constantly enlarge with their radiuses [11, 15]. The third model type is an exponential model in which atrophic lesions enlarge exponentially. Compared to a linear growth model, Dreyhaupt et al. [13] found that the assumption of exponential growth led to improved model fits.
(ii) Ageofonset estimation. Another relevant topic for modeling GA size progression is the estimation of the age of disease onset. Research on this topic is motivated by the fact that in many clinical trials patients can only be included when the disease is already manifested in a later stage. The estimated ageofonset may, in contrast to lesion size, be considered as timeinvariant variable, and facilitate association analyses with other timeinvariant variables such as the genotype.
(iii) Identification of risk factors and confounding variables. For the development of AMD treatments, it is essential to specify meaningful inclusion and exclusion criteria for use in future clinical trials. It is therefore of high importance to identify relevant risk factors and confounding variables, and to analyze their relationships with GA size progression. Such an analysis can be achieved by building a multivariable regression model from observational data that includes relevant risk factors and confounders as covariables.
To address the issues described above, we derive a statistical regression model that includes (possibly transformed versions of) GA size as response variable, as well as potential risk factors and/or confounders (such as e.g. age, smoking) as covariables. To account for the above mentioned correlations between eyes of the same patient as well as temporal correlations, we investigate the use of a mixedeffects modeling approach with patient and eyespecific random effects terms. In this framework, we identify the “optimal” transformation of GA size by conducting a systematic search within the family of BoxCox transformations [16]. As will be shown, this systematic approach also allows for the derivation of formulas for ageofonset estimation. Furthermore, we demonstrate how predictions of future (untransformed) GA size values can be obtained from the fitted regression model.
For model derivation and illustration, we will apply the proposed methods to a data set collected by the multicenter Fundus Autofluorescence in AMD (FAM) study (NCT00393692) and by its singlecenter extension study, the Directional Spread in Geographic Atrophy (DSGA) study (NCT02051998). These noninterventional, prospective natural history studies adhered to the tenets of the Declaration of Helsinki and were approved by the institutional review boards of the participating centers. Written informed consent was obtained from each participant after explanation of the studies’ nature and possible consequences of participation.
Methods
Data
The data set used here was collected from patients with GA secondary to AMD that were recruited for the FAM study and followedup in the DSGA study.
The inclusion and exclusion criteria have been described elsewhere [14, 17]. In brief, the two studies included eyes without any history of retinal surgery, radiation therapy, laser photocoagulation or retinal diseases other than AMD. GA size measurements were obtained by grading FAF retinal images that were recorded at the baseline and followup visits. Data was only used for statistical analysis if the difference in total GA size between two graders was smaller than 0.15mm^{2} and if the patients had at least two visits.
Our analysis data set contained N=150 eyes from n=101 patients that where examined in up to nine followup visits. At baseline, the median age was 75.7 years (IQR: 70.7−80.6 years); 61.4% of the patients were female, and the mean followup time was 3.36 years (range 0.5−13.7 years) due to the extension by the second study. The GA size varied strongly between eyes: mean GA size at baseline was 5.64mm^{2}, ranging between 0.07mm^{2} and 31.41 mm^{2}. The status of hypertension and hypercholesterolemia was assessed by a patientreported questionnaire at the baseline visit. Information was obtained based on patients’ reports and current medication; medical reports were included in the assessment if available. For details see Table 1.
Regression modeling
Within a typical ophthalmologic study setting, patients participate in several followup visits at which one or both eyes are examined. This leads to correlated measurements, both within the patients and over time. Thus, a model is needed that captures complex correlation structures. A popular regression model, which has been used regularly in the literature on GA [11, 13] and which is also considered here, is a mixedeffects model with random effects terms for both eye and patient. Yet, there exists a variety of model specifications and the specific structure is still a matter of debate [18].
Before introducing the full mixedeffects model with possible risk factors and confounders, we start with a model that contains a time trend as only (continuous) covariable. This model serves as a basic model that captures the time dependency of GA enlargement.
Mixedeffects model with time as only covariable. As suggested by Shen et al. [18], we follow the hypothesis that the progression of GA has an underlying process of GA expansion that is mostly the same over time for all eyes. Differences in eyes may arise due to different exposition to environmental conditions, and, most importantly, GA size varies between patients as they enter the study at different time points in their disease history. We therefore propose to include the disease age Δ_{i}≥0 of an eye i at study entry directly in the model. We further assume that the atrophy size y_{it} of an eye i depends on the (unknown) age of the disease at study entry Δ_{i} and the (observable) followup time t≥0 that has passed since. Time is assumed to be measured on a continuous scale, e.g. in days or years since baseline. Under the assumptions by Shen et al. [18], and considering (for the moment) a linear enlargement of GA, this leads to the following regression model:
where β denotes the regression slope (i.e. the constant enlargement rate). The residuals ε_{it},i=1,…,N, are assumed to be normally distributed with zero mean and variance σ^{2}.
If it is further assumed that the disease age at study entry can be approximated by a normal distribution, the model in (1) can be parameterized such that it becomes a linear mixedeffects model. This is seen by defining \(\theta _{i} := \beta \cdot \Delta _{i} \sim \mathcal {N}\left (\mu _{\theta }, \sigma ^{2}_{\theta }\right)\) and \( \alpha _{i} := \theta _{i}  \mu _{\theta } \sim \mathcal {N}\left (0, \sigma ^{2}_{\theta }\right)\), so that Model (1) can be written as
In this form, the model reads as follows: The atrophy size y_{it} depends on a fixed intercept μ_{θ}, an eyespecific random intercept α_{i} that reflects the deviation of the disease age of eye i at study entry from the mean disease age at study entry, and an overall linear time trend βt that is the same for all eyes.
When there are patients in the study that contributed data from both eyes, one needs to consider the nested data structure and account for the correlations between measurements taken from the same patient. This can be done by extending the model equation as follows:
where \(\zeta _{j} \sim \mathcal {N}\left (0, \sigma _{\zeta }^{2}\right), j=1,\ldots,n\), is a normally distributed patient effect and α_{i} the effect of an ’eye within a patient’. Note: While it is assumed that the residual terms ε_{ijt} are independent of the random effects α_{i} and ζ_{j}, the latter two terms are generally allowed to be correlated. For simplicity, and without loss of generality, we will assume independence of all random effects terms in the following.
Mixedeffects model with covariables. When introducing covariables into the model, it is reasonable to assume that risk factors and/or confounders equally influence the enlargement of GA before and after inclusion of an eye in the study. This assumption can be incorporated in Model (1) by adding a covariabledependent slope to the model equation:
where \(\boldsymbol {x}_{i} = (x_{1},..., x_{k})^{\intercal }_{i}\) is a vector of k (possibly timedependent) risk factors for each eye and \(\boldsymbol {\beta }_{x} = \left (\beta _{x_{1}},..., \beta _{x_{k}}\right)^{\intercal }\) is a vector of parameters that accelerate or slow down GA size progression (\(\beta _{x_{s}} > 0\) and \(\beta _{x_{s}} < 0\), respectively, s∈{1,…k}). Note that in the following, we will not distinguish between risk factors and confounders any more, as we assume that both are collected in the vectors x_{i}.
Similar to the reparametrization used above, we write \( \Delta _{i} := (\mu _{\Delta } + \gamma _{i}) \sim \mathcal {N}\left (\mu _{\Delta }, \sigma _{\Delta }^{2}\right)\), where μ_{Δ} and \(\sigma _{\Delta }^{2}\) denote the mean and the variance of the ithe eye at study entry.
The mixedeffects model with covariables can then be written as
with eyespecific random effects \(\gamma _{i} \sim \mathcal {N}\left (0, \sigma _{\Delta }^{2}\right)\). The linear enlargement in Model (5) thus implies dependency of y_{it} on an interaction term between t and x_{i}, and also on random slopes of the covariable values x_{i}. Importantly, Eq. 5 implies numerous dependencies between the slope parameters associated with t, x_{i},x_{i}t,γ_{i}, and x_{i}γ_{i}, so that the model no longer possesses the structure of a “standard” mixedeffects model with unrestricted estimation of coefficients. Details on model fitting will be given below.
Finally, when considering patients that contribute data from both eyes, one specifies
with patientspecific random effects \(\zeta _{j} \sim \mathcal {N}\left (0, \sigma _{\zeta }^{2}\right), j=1,\ldots, n\), and an additional interaction term between x_{i} and ζ_{j}.
The model equations presented so far ascribe a linear relationship between time, risk factors, and GA size. In the following section, possible transformations are examined, so that the modeling approach is extended to model nonlinear progressions.
Transformation of the response
As an example, Fig. 1 A shows the GA size trajectories of four eyes contained in the analysis data set. Considering these progressions, it is conceivable to assume that the trajectories are not strictly linear. Since the model equations above (Models (1) to (6)) refer to linear enlargement processes, a transformation of the response is convenient for modeling nonlinear progression (see Fig. 1B).
Three different transformation approaches have been used in recent publications on GA size progression (e.g. [11, 13–15]): (i) Linear models with no response transformation implying a linear relationship between GA size and the covariables, (ii) linear models with square root transformation of the response, and (iii) linear models with logtransformed response – or equivalently exponentially transformed models with no transformed response – implying an exponential enlargement of the lesion size.
BoxCox transformation Instead of comparing only the most commonly used transformations, we consider a systematic and more comprehensive strategy for finding an appropriate transformation of the GA size. For this systematic approach, the BoxCox model class is applied because it covers a wide range of transformations, including the transformations (i) to (iii) above. More specifically, for an atrophy size y>0 we consider the class of BoxCox transformations
as introduced by [16]. Applying (7) to one of the Models (1)(6) reads as follows: λ=1 refers to a model with no response transformation, λ=0.5 corresponds to a squareroot transformation of the response and λ=0 can be interpreted as exponential enlargement of the GA size.
Model comparison The main criterion used for our model comparisons was Akaike’s Information Criterion (AIC) [19]. More specifically, our aim was to choose the transformation parameter λ that minimized AIC on the analysis data set while assuring that the assumptions of Models (1) to (6) were best possibly met, in particular the normality of the residuals. The AIC is defined by AIC=−2· log(L)+2·n_{params}, where L is the likelihood of the model under consideration (evaluated at the maximum likelihood estimate) and n_{params} denotes the number of parameters used in the model. As we compared models with a transformed response, we applied the density transformation theorem to compute the likelihood L.
Maximum likelihood estimation The estimation of the model parameters was performed by maximum likelihood (ML) estimation. ML estimation was carried out for a grid of fixed transformation parameters λ using the transformed GA size values. Subsequently, the likelihoods were compared and the transformation parameter referring to the model with minimum AIC was considered best.
We initially assumed that there was an “optimal” value λ for which the transformed atrophy size given the random effects followed a normal distribution. In addition, we briefly considered random effects with an unspecified mixing distribution as a nonparametric crosscheck. The two approaches will be described in the next paragraphs.
Normally distributed random effects As noted above, the linear model in (6) imposes numerous side conditions on the slope parameters associated with t, x_{i},x_{i}t,γ_{i}, and x_{i}γ_{i}. In order to fit Model (6) using readily available software for the estimation of the slope parameters (without side conditions, such as the R addon package lme4[20], version 1.125), we propose to iterate the following steps:

(i)
For given estimates \(\hat {\beta }\) and \(\hat {\boldsymbol {\beta }}_{x}\) compute the values of the working covariable \(\tilde {x}_{i} := \hat {\beta } + \hat {\boldsymbol {\beta }}_{x}^{\intercal } \boldsymbol {x}_{i}\).

(ii)
Fit the linear mixedeffects model
$$ y_{{ijt}} = \beta t + \boldsymbol{\beta}_{x}^{\intercal} \boldsymbol{x}_{i} t + \mu_{\Delta} \tilde{x}_{i} + \tilde{x}_{i} \gamma_{i} + \tilde{x}_{i} \zeta_{j} + \epsilon_{{ijt}} $$(8)to obtain updates of the coefficient estimates of \(\hat {\mu }_{\Delta }, \hat {\beta }\), and \(\hat {\boldsymbol {\beta }}_{x}\). Note, that Model (8) is just a reformulation of Model (6) that can be fitted without side conditions on its slope parameters. For the fitting procedure a fixed intercept term is added to increase computational stability and to relax the condition that the empirical mean of estimated random effects terms is forced to be zero.
The starting values for \(\hat {\beta }\) and \(\hat {\boldsymbol {\beta }}_{x}\) in Step (i) may be obtained from (8) with an initial value of \(\tilde {x}_{i} = 1\). As demonstrated in the supplementary materials (see Additional file 1), repeated execution of (i) and (ii) will typically converge to the final estimates after less than 20 iterations.
Random effects with unspecified mixing distribution As an alternative to mixedeffects modeling with normally distributed terms, Almohaimeed et al. [21] proposed to consider a nonparametric maximum likelihood (NPML) approach. This approach approximates the distribution of each random effect by a discrete distribution with finite number of mass points K. It then uses an expectationmaximization algorithm to find the nonparametric maximum likelihood estimate. Here, the NPML approach is used to verify the optimal transformation parameter obtained from modeling with normally distributed random effects.
Ageofonset estimation
Model without covariables As defined by [22], a diagnosis for GA can be given at a minimum lesion diameter of 250 µm and thus a lesion area of 0.05mm^{2}. Based on this specification and denoting λ_{opt} as the value of λ that is optimal w.r.t. AIC, the time \(\hat {t}_{0_{{ij}}}\) at which the atrophy size was \(\hat {y}_{ijt_{0}} = 0.05 [\text {mm}^{2}]\) (i.e. \( \hat {y}^{(\lambda)}_{ijt_{0}} = \lambda _{{opt}}^{1} \cdot (0.05^{\lambda _{{opt}}}1) \)) can be obtained by solving the model equation of the transformed mixedeffects Model (3) for t:
where \(\hat {\beta }\) and \(\hat {\mu }_{\theta }\) denote the ML estimates of β and μ_{θ}, respectively, and \(\hat {\zeta }_{j}\) and \( \hat {\alpha _{i}}\) denote the realizations of the random effect terms. As a consequence, subtracting the estimated time \(\hat {t}_{0_{{ij}}}\) from the patient’s age at study entry results in the estimated ageofonset of GA in the ith eye of patient j. Remark: While from a modeling perspective a theoretical atrophy size of \(y_{ijt_{0}}={0} \text {mm}^{2}\) could be defined at the time of disease onset, we will focus on the clinically relevant definition \(\left (y_{ijt_{0}}={0.05} \text {mm}^{2}\right)\) here. For y=0 it holds that \(t_{0_{{ij}}} = \Delta _{{ij}} =\frac {1}{\beta } \cdot (\mu _{\theta } + \zeta _{j} + \alpha _{i} \)).
Model with covariables Analogous to (9) one can estimate the ages of GA onset of the study eyes in a model with additional covariables. From Eq. 8 one obtains
where \(\tilde {x}_{i} := \hat {\beta } + \hat {\boldsymbol {\beta }}_{x}^{\intercal } \boldsymbol {x}_{i}\) contains the parameters obtained from ML estimation.
Prediction
Evaluating a model and its coefficients only on a transformed scale is challenging as the linearity of the predictorresponse relationships in Models (5) and (6) only holds on the transformed scale but not on the original scale of the response (provided that λ≠1). As a consequence, the calculation of the expected GA size \(\mathbb {E}(y\boldsymbol {x})\) – and hence any prediction of expected disease progression – cannot be done in an unbiased way by a simple backtransformation.
To see this, consider a nonlinear BoxCox transformation f(y) with an arbitrary parameter λ≠1 and, where existent, the corresponding inverse BoxCox transformation f^{−1}(y). Further, let f(y_{ijt}x_{i})=z_{ijt}+ε_{ijt}, where \({z}_{{ijt}} := \mathbb {E}(f(y_{{ijt}}\boldsymbol {x}_{i}))\) and ε_{ijt} denote the linear predictor and the residual, respectively in one of the above models. A naive backtransformation would directly take the inverse of the linear predictor, i.e. f^{−1}(z_{ijt}), which differs from the desired expected GA size value \(\mathbb {E}(y_{{ijt}}\boldsymbol {x}_{i}) = \mathbb {E}\left (f^{1}\left ({z}_{{ijt}}+ {\epsilon }_{{ijt}}\right)\right)\) by Jensens’s inequality [23]. In other words, \(f^{1}\left (\mathbb {E}\left (f\left (y_{{ijt}}\boldsymbol {x}_{i}\right)\right)\right) \neq \mathbb {E}(y_{{ijt}}\boldsymbol {x}_{i})\). To address this issue and to obtain unbiased predictions of the GA size, we propose to sample r=10,000 residuals from the empirical distribution \(\hat {\epsilon }_{1},..., \hat {\epsilon }_{r}\) in the respective fitted model. The expected atrophy size on the original scale can then be estimated by \(\widehat {\mathbb {E}(y_{{ijt}}\boldsymbol {x}_{i})} := \frac {1}{r}\sum _{u = 1}^{r} f^{1}\left (\hat {z}_{{ijt}} + \hat {{\epsilon }}_{u} \right)\), where \(\hat {z}_{{ijt}}\) denotes the fitted value of f(y_{ijt}x_{i}).
Results
In this section, we present the results obtained from fitting Models (2), (3) and (6) to the analysis data set (150 eyes of 101 patients). Missing values in the covariables were imputed using the R package mice [24] with one imputation run. Fitting was done using lme4 [20] with the algorithm described above.
Modeling of GA size trajectories
Determination of the transformation parameter In order to determine the optimal value of the transformation parameter λ, we evaluated linear mixedeffects models of the forms (3) and (6) on the analysis data set. BoxCoxtransformed responses with varying values of λ were considered in each of the models. As seen in Fig. 2A, the minimum AIC value was reached at λ_{opt}=0.45 in the model without covariables. The model with covariables also yielded an optimal AIC value at λ_{opt}=0.45 (Fig. 2B).
The NPML approach led to similar results for the optimal value of λ in the setting without covariables. As seen in Fig. 3, the obtained values for the optimal λ ranged between 0.35 and 0.5. For a larger number of mass points (K>7) the same optimal λ (= 0.45) as in the parametric approach was found.
Normality of the residuals Figure 4 shows the residual diagnostics obtained from fitting Model 6 to the analysis data, including hypercholesterolemia and hypertesnsion as risk factors. It is seen that even after transformation the fitted residuals were not normally distributed. However, homoscedasticity was better met after transformation with λ_{opt}=0.45. Furthermore, the distribution of the residuals was less skewed after transformation.
Effects of risk factors As shown in Fig. 4, the residuals obtained from fitting Model 6 to the analysis data set did not perfectly follow a normal distribution, even after transformation of the response. Therefore, inference procedures that rely on asymptotic normality may not be the best choice to investigate the effects of risk factors on (transformed) GA size. To address this issue, we used a bootstrap approach to obtain the 95% confidence intervals of the parameters within Model (6). The results are presented in Table 3 and in Fig. 5. It is seen, that time was associated with the transformed GA size, growing by 0.42 (95% CI [0.36,0.50]) per year. Also the absence of hypercholesterolemia was associated with more rapid enlargement of the lesion size (estimate: 0.11, 95% CI [0.06,0.17]), while a slower progression in patients without hypertension (estimate: −0.09, 95% CI [ −0.17,−0.03]) was found. Note that the estimated coefficients refer to transformed GA size and thus cannot be directly interpreted in terms of an enlargement of the GA size measured in mm^{2}.
Remark: Model fitting was performed on an imputed data set, using the R package mice [24] with one imputation. Results obtained from complete case analysis were almost identical.
Ageofonset estimation
Figure 6 presents the estimated ages of disease onset of the study eyes, as obtained from Models (3) (without covariables) and (6) (with covariables). For the simple model without further covariables, the estimated mean ageofonset was 66.93 (±7.56) years and for the model with covariables the estimated median ageofonset was 67.21 (±6.49) years. This is in line with previously reported results, e.g. Li et al. [26] estimated the prevalence of GA in people under 64 years to range between 0.1% and 0.2%, depending on the country.
Estimation of GA size on the original scale
To obtain the distribution of GA size on the original scale, we sampled 10,000 times from the empirical distribution of the estimated residuals (obtained from Model (6)) and added these values to the fitted transformed GA size values f_{λ}(y) before applying a reverse BoxCox transformation. The backtransformed expected GA size values are shown in Fig. 7.
The root mean squared difference between the observed GA size and the modeled GA size was 1.10mm^{2}, implying that estimated expected GA size values deviated by ca. 1mm^{2} on average from the true GA size values. The respective mean squared differences for alternative values of the transformation parameter λ are shown in Fig. 8. As can be seen here, the λ, that lead to a minimal difference on the original scale, was slightly larger than the optimal λ=0.45 obtained by AICbased methods. However, the variation in the average distances between observed and predicted values was rather small (minimal distance 1.05mm^{2} at λ=0.55,1.06mm^{2} at λ=0.50, and 1.10mm^{2} at λ=0.45).
Prediction of next observation In clinical context, a prediction of the next observation of a patient already included in a clinical trial might be of interest. For each observed eye, for which values of more than three visits were present, we predicted the last observation. To this purpose we fitted a model to a training data set excluding the last observation while performance was measured on the last observation. The root mean squared difference between observed atrophy sizes and the mean predicted atrophy sizes was \(\sqrt {\text {avg}((\bar {\hat {y}}  y)^{2})} = {1.67} \text {mm}^{2}\).
Discussion
Despite a high prevalence and extensive research efforts, there are currently no effective standard treatment options for GA. It is therefore essential to develop accurate models for disease progression that enable researchers to efficiently plan and design clinical trials.
In this article, we presented a comprehensive framework for modeling the course of GA size progression in longitudinal observational studies. Our modeling approach was derived from a linear enlargement model using transformed GA size as response variable. As shown in the Results section, the resulting model can be embedded in the class of linear mixedeffects models [27], allowing for the incorporation of risk factors, confounding variables, and measurements taken repeatedly from the same patients and eyes. Since the assumption of linear enlargement imposes numerous restrictions on the model parameters, it is necessary to adapt standard (unrestricted) mixedeffects modeling approaches to the specific structure of the proposed model. To this purpose, we developed an algorithm for GA size modeling that can be implemented using readily available software for fitting linear mixedeffects models.
To obtain the best transformation of GA size, we conducted a systematic search within the class of BoxCox transformation models that included both parametric and nonparametric approaches. Our experiments yielded an optimal transformation that was close to the squareroot function, thereby justifying earlier modeling strategies that assumed linear trajectories of squareroot transformed GA size over time [18]. Of note, the squareroot transformation has a straightforward interpretation in terms of a linear enlargement of the atrophy radius [15].
A convenient feature of the proposed modeling approach is that it yields estimates of the disease age of the eyes at study entry. This is important because patients can only be included in trials when the disease has already manifested. When applied to the analysis data set consisting of patients included in the FAMstudy, disease age at study entry was estimated to range between 3.5 and 13.4 years (Model (6)). These estimates are in line with estimated prevalence values reported in the literature [4], but the resulting ages of disease onset were smaller than previously modeled ages using data partly from the same study [28].
Since the proposed modeling approach employs a transformed response variable, care has to be taken when making predictions of future values of atrophy size. As argued in the Results section, predictions with a naive backtransformation may show a bias due to the nonlinearity of the squareroot function. To address this issue, we proposed a sampling approach that allows for drawing valid conclusions and making undistorted predictions of GA size on its original scale. In the analysis data set, estimated expected GA size values derived from the proposed model deviated 1.10mm^{2} on average from the respective observed values.
Generally, the model proposed here allows for performing statistical hypothesis tests on a set of risk factors suspected to accelerate or slow down GA size enlargement. This strategy was illustrated in the Results section, where an analysis of a GA patient sample of the FAM study identified significant interaction effects between hypercholesterolemia, hypertension and time. Although a number of studies have shown a link between cardiovascular risk factors and AMD, the role of hypertension, atherosclerosis, high BMI, diabetes mellitus, higher plasma fibrinogen and hyperlipidaemia remain equivocal owing to inconsistent findings (reviewed in [29]). High blood pressure is shown to be associated with lower choroidal blood flow and disturbed vascular homeostasis [30]. Since perfusion deficits in the choriocapillaris, the innermost layer of the choroid, are associated with future GA progression [31], an associate between hypertension and increased GA progression appears biologically plausible. Regarding the association of hypercholesterinemia and decreased GA progression, the biological plausibility remains elusive. The majority of previous studies did not find any relationship between systemic cholesterol levels and progression to early AMD, GA or nAMD (reviewed in [29]), although two studies found an association between serum cholesterol on the development of late stage AMD [32, 33]. Interestingly, one of these studies reported that serum cholesterol levels have a protective effect on the development of nAMD, while they are a risk factor for the development of GA [32]. These observations apparently are in contrast to our results; however, there is evidence that different mechanisms may be involved in driving GA enlargement than those increasing the risk of de novo GA development [6]. Further validation of the risk factors, especially on an external data set, is necessary
While it has been established that socalled nascent GA progresses to manifest GA [34], the trajectory of early GA – prior to the minimum lesion size requirement for clinical trials (e.g., 2.5mm^{2}) – is poorly understood. The information derived by this modeling strategy can be used to design future intervention studies, for example regarding the stratification of patient groups and the definition of inclusion criteria. Of note, the proposed modeling approach is not restricted to established epidemiological covariables like hypertension but may also incorporate novel markers of disease progression such as patientreported outcome measures [35], digital biomarkers, and machinelearningbased scores derived from structural imaging data [36]. The proposed model constitutes a flexible framework to systematically investigate the transition from intermediate to late AMD in large observational studies such as the MACUSTAR study (ClinicalTrials.gov: NCT03349801) [37].
Conclusions
We have provided a comprehensive framework for modelling the trajectories of uni or bilateral Ga size progression in longitudinal observational studies. Our analysis shows that a squareroot transformation of atropy size is recommended before model fitting. The proposed modelling approach allows for the estimation of ageofonset, identification of risk factors and prediction of future GA size. The risk factors analyzed here require further validation in an external study population.
Availability of data and materials
The datasets generated during and/or analysed during the current study are not publicly available in order to protect the privacy of study participants. However, they are available from the principal investigators of the FAM and DSGA studies on reasonable scientific request.
Abbreviations
 AIC:

Akaike’s information criterion
 AMD:

Age related macular degeneration
 avg:

Average
 BMI:

Body max index
 CI:

Confidence interval
 DSGA:

Directional spread in geographic atrophy study
 FAF:

Fundus autoflourescence
 FAM:

Fundus autoflourescence in AMD study
 GA:

Geographic atrophy
 IQR:

Interquartile range
 ML:

Maximum likelihood
 nAMD:

Neovascular AMD
 NPML:

Nonparametic maximum likelihood
 RPE:

Retinal pigment epithelium
References
Lim LS, Mitchell P, Seddon JM, Holz FG, Wong TY. Agerelated macular degeneration. Lancet. 2012:1728–38.
Wong WL, Su X, Li X, Cheung CMG, Klein R, Cheng CY, Wong TY. Global prevalence of agerelated macular degeneration and disease burden projection for 2020 and 2040: a systematic review and metaanalysis. Lancet Glob Health. 2014; 2:106–16.
Sunness JS. The natural history of geographic atrophy, the advanced atrophic form of agerelated macular degeneration. Mol Vis. 1999; 5:25.
Friedman DS, O’Colmain BJ, Munoz B, Tomany SC, McCarty C, De Jong P, Nemesure B, Mitchell P, Kempen J, et al.Prevalence of agerelated macular degeneration in the united states. Arch Ophthalmol. 2004; 122:564–72.
Holz FG, SchmitzValckenberg S, Fleckenstein M. Recent developments in the treatment of agerelated macular degeneration. J Clin Investig. 2014; 124:1430–8.
Fleckenstein M, Mitchell P, Freund KB, Sadda S, Holz FG, Brittain C, Henry EC, Ferrara D. The progression of geographic atrophy secondary to agerelated macular degeneration. Ophthalmology. 2018; 125:369–90.
Sunness JS, GonzalezBaron J, Applegate CA, Bressler NM, Tian Y, Hawkins B, Barron Y, Bergman A. Enlargement of atrophy and visual acuity loss in the geographic atrophy form of agerelated macular degeneration. Ophthalmology. 1999; 106:1768–79.
Holz FG, Sadda SR, Staurenghi G, Lindner M, Bird AC, Blodi BA, Bottoni F, Chakravarthy U, Chew EY, Csaky K, et al.Imaging protocols in clinical studies in advanced agerelated macular degeneration: recommendations from classification of atrophy consensus meetings. Ophthalmology. 2017; 124:464–78.
Arslan J, Samarasinghe G, Benke KK, Sowmya A, Wu Z, Guymer RH, Baird PN. Artificial intelligence algorithms for analysis of geographic atrophy: A review and evaluation. Transl Vis Sci Technol. 2020; 9(2):57.
Holekamp N, Wykoff CC, SchmitzValckenberg S, Monés J, Souied EH, Lin H, Rabena MD, Cantrell RA, Henry EC, Tang F, et al.Natural history of geographic atrophy secondary to agerelated macular degeneration: results from the prospective proxima a and b clinical trials. Ophthalmology. 2020; 127(6):769–83.
Keenan TD, Agrón E, Domalpally A, Clemons TE, van Asten F, Wong WT, Danis RG, Sadda S, Rosenfeld PJ, Klein ML, et al.Progression of geographic atrophy in agerelated macular degeneration: Areds2 report number 16. Ophthalmology. 2018; 125:1913–28.
SchmitzValckenberg S, Sahel JA, Danis R, Fleckenstein M, Jaffe GJ, Wolf S, Pruente C, Holz FG. Natural history of geographic atrophy progression secondary to agerelated macular degeneration (geographic atrophy progression study). Ophthalmology. 2016; 123:361–8.
Dreyhaupt J, Mansmann U, Pritsch M, DolarSzczasny J, Bindewald A, Holz F. Modelling the natural history of geographic atrophy in patients with agerelated macular degeneration. Ophthalmic Epidemiol. 2005; 12:353–62.
Holz FG, BindewaldWittich A, Fleckenstein M, Dreyhaupt J, Scholl HP, SchmitzValckenberg S, Group FS, et al.Progression of geographic atrophy and impact of fundus autofluorescence patterns in agerelated macular degeneration. Am J Ophthalmology. 2007; 143:463–72.
Feuer WJ, Yehoshua Z, Gregori G, Penha FM, Chew EY, Ferris FL, Clemons TE, Lindblad AS, Rosenfeld PJ. Square root transformation of geographic atrophy area measurements to eliminate dependence of growth rates on baseline lesion measurements: a reanalysis of agerelated eye disease study report no. 26. JAMA Ophthalmology. 2013; 131:110–1.
Box GE, Cox DR. An analysis of transformations. J R Stat Soc Ser B (Methodol). 1964; 26:211–43.
Lindner M, Bezatis A, Czauderna J, Becker E, Brinkmann CK, SchmitzValckenberg S, Fimmers R, Holz FG, Fleckenstein M. Choroidal thickness in geographic atrophy secondary to agerelated macular degeneration. Invest Ophthalmol Vis Sci. 2015; 56:875–82.
Shen L, Liu F, Nardini HG, Del Priore LV. Natural history of geographic atrophy in untreated eyes with nonexudative agerelated macular degeneration: a systematic review and metaanalysis. Ophthalmol Retin. 2018; 2:914–21.
Akaike H. A new look at the statistical model identification. IEEE Trans Autom Control. 1974; 19(6):716–23.
Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixedeffects models using lme4. J Stat Softw. 2015; 67:1–48. https://doi.org/10.18637/jss.v067.i01.
Almohaimeed A, Einbeck J, Almohaimeed MA. Package Boxcoxmix. 2018. R package version 0.21. https://CRAN.Rproject.org/package=boxcoxmix. Last accessed: 08 Feb 2021.
Sadda SR, Guymer R, Holz FG, SchmitzValckenberg S, Curcio CA, Bird AC, Blodi BA, Bottoni F, Chakravarthy U, Chew EY, et al.Consensus definition for atrophy associated with agerelated macular degeneration on oct: classification of atrophy report 3. Ophthalmology. 2018; 125(4):537–48.
Godunova EK. Jensen inequality. Encyclopedia of Mathematics: Springer Verlag GmbH, European Mathematical Society; 2011. http://encyclopediaofmath.org/index.php?title=Jensen_inequality&oldid=47465. Accessed 08 Feb 2020.
van Buuren S, GroothuisOudshoorn K. mice: Multivariate imputation by chained equations in r. J Stat Softw. 2011; 45:1–67.
Kuznetsova A, Brockhoff PB, Christensen RHB. lmerTest package: Tests in linear mixed effects models. J Stat Softw. 2017; 82(13):1–26. https://doi.org/10.18637/jss.v082.i13.
Li JQ, Welchowski T, Schmid M, Mauschitz MM, Holz FG, Finger RP. Prevalence and incidence of agerelated macular degeneration in europe: a systematic review and metaanalysis. Br J Ophthalmol. 2020; 104(8):1077–84.
Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. Berlin Heidelberg: Springer Science & Business Media; 2009.
Adrion C, Fleckenstein M, SchmitzValckenberg S, Holz F, Mansmann U. Estimation of disease onset for patients with geographic atrophy due to agerelated macular degeneration. Gesundheitswesen. 2010; 72:207.
Heesterbeek TJ, LorésMotta L, Hoyng CB, Lechanteur YT, den Hollander AI. Risk factors for progression of agerelated macular degeneration. Ophthalmic Physiol Opt. 2020; 40(2):140–70.
Metelitsina T, Grunwald J, DuPont J, Ying G. Effect of systemic hypertension on foveolar choroidal blood flow in age related macular degeneration. Br J Ophthalmol. 2006; 90(3):342–6.
Müller PL, Pfau M, SchmitzValckenberg S, Fleckenstein M, Holz FG. Optical Coherence TomographyAngiography in Geographic Atrophy. Ophthalmologica. 2020; 244(1):42–50. https://doi.org/10.1159/000510727.
Tomany S, Wang JJ, Van Leeuwen R, Klein R, Mitchell P, Vingerling JR, Klein BE, Smith W, De Jong PT. Risk factors for incident agerelated macular degeneration: pooled findings from 3 continents. Ophthalmology. 2004; 111(7):1280–7. https://doi.org/10.1016/j.ophtha.2003.11.010.
Buch H, Vinding T, La Cour M, Jensen GB, Prause JU, Nielsen NV. Risk factors for agerelated maculopathy in a 14year followup study: the copenhagen city eye study. Acta Ophthalmol Scand. 2005; 83(4):409–18.
Wu Z, Luu CD, Hodgson LA, Caruso E, Tindill N, Aung KZ, McGuinness MB, Makeyeva G, Chen FK, Chakravarthy U, et al.Prospective longitudinal evaluation of nascent geographic atrophy in agerelated macular degeneration. Ophthalmol Retin. 2020; 4(6):568–75.
Ying GS, Maguire MG. Complications of Agerelated Macular Degeneration Prevention Trial Research Group. Development of a risk score for geographic atrophy in complications of the agerelated macular degeneration prevention trial. Ophthalmology. 2011; 118(2):332–8. https://doi.org/10.1016/j.ophtha.2010.06.030.
Pfau M, von der Emde L, de Sisternes L, Hallak JA, Leng T, SchmitzValckenberg S, Holz FG, Fleckenstein M, Rubin DL. Progression of photoreceptor degeneration in geographic atrophy secondary to agerelated macular degeneration. JAMA Ophthalmol. 2020; 138(10):1026–34.
Finger RP, SchmitzValckenberg S, Schmid M, Rubin GS, Dunbar H, Tufail A, Crabb DP, Binns A, Sánchez CI, Margaron P, et al.Macustar: development and clinical validation of functional, structural, and patientreported endpoints in intermediate agerelated macular degeneration. Ophthalmologica. 2019; 241(2):61–72.
Acknowledgements
This work was supported by the DFG (German Research Foundation) Grant No FL658/41 and FL658/42 (MF); PF950/11 (MP) Research Priority Program AgeRelated Macular Degeneration Grant SPP 1088, Ho 1926/13 (FGH); Grants from National Institutes of Health Core Grant (EY014800) (MF); and in part by an Unrestricted Grant from Research to Prevent Blindness, New York, NY, to the Department of Ophthalmology & Visual Sciences, University of Utah.
The authors would like to thank Joanna Czauderna for her support in conducting the DSGA study.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Contributions
Conception and design: MS, MF, CB. Implementation and analysis: CB. Data collection and interpretation: MF, MP, LG, ML, SSV. Review and approval of final manuscript: CA, MP, MF. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
This study does not include human subjects. Data presented here are from the Fundus Autofluorescence in AMD (FAM) study (ClinicalTrials.gov Identifier: NCT00393692) and the Directional Spread in Geographic Atrophy (DSGA) study (ClinicalTrials.gov Identifier: NCT02051998). The FAM study was approved by the human ethics committees at the University of Bonn and the local Institutional Review Boards and the local ethics committees at the study centers (University of Bonn, University of Heidelberg, University of Leipzig, LudwigMaximiliansUniversity Munich, St. Franziskus Hospital Münster, University of Würzburg). The human ethics committees at the University of Bonn approved the DSGA study. All research adhered to the tenets of the Declaration of Helsinki. All participants provided informed consent.
Consent for publication
All authors have read and approved the submission of the manuscript.
Competing interests
Financial disclosures:
S. SchmitzValckenberg reports grants from Acucela/Kubota Vision, personal fees from Apellis, grants and personal fees from Novartis, grants and personal fees from Allergan, grants and personal fees from Bayer, grants and personal fees from Bioeq/Formycon, grants, personal fees and nonfinancial support from Carl Zeiss MediTec AG, grants and nonfinancial support from Centervue, personal fees from Galimedix, grants, personal fees and nonfinancial support from Heidelberg Engineering, grants from Katairo, nonfinancial support from Optos, personal fees from Oxurion, outside the submitted work.
M. Fleckenstein reports grants, personal fees and nonfinancial support from Heidelberg Engineering, nonfinancial support from Zeiss Meditech, grants and nonfinancial support from Optos, personal fees and grant from Novartis, personal fees from Bayer, grants and personal fees from Genentech, from Roche, outside the submitted work; In addition, Dr. Fleckenstein has a patent US20140303013 A1 pending.
F.G. Holz reports grants and personal fees from Acucela, Allergan, Apellis, Bayer, Bioeq/Formycon, Roche/Genentech, Geuder, Heidelberg Engineering, ivericbio, Kanghong, Novartis, Zeiss; personal fees from BoehringerIngelheim, Grayburg Vision, LinBioscience, Pixium Vision, Stealth BioTherapeutics, Aerie, Oxurion outside the submitted work.
M. Schmid reports personal fees from Pixium Vision.
The remaining authors have no competing interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1
Supplementary information.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Behning, C., Fleckenstein, M., Pfau, M. et al. Modeling of atrophy size trajectories: variable transformation, prediction and ageofonset estimation. BMC Med Res Methodol 21, 170 (2021). https://doi.org/10.1186/s12874021013560
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12874021013560