Improved selection of participants in genetic longevity studies: family scores revisited

Background Although human longevity tends to cluster within families, genetic studies on longevity have had limited success in identifying longevity loci. One of the main causes of this limited success is the selection of participants. Studies generally include sporadically long-lived individuals, i.e. individuals with the longevity phenotype but without a genetic predisposition for longevity. The inclusion of these individuals causes phenotype heterogeneity which results in power reduction and bias. A way to avoid sporadically long-lived individuals and reduce sample heterogeneity is to include family history of longevity as selection criterion using a longevity family score. A main challenge when developing family scores are the large differences in family size, because of real differences in sibship sizes or because of missing data. Methods We discussed the statistical properties of two existing longevity family scores: the Family Longevity Selection Score (FLoSS) and the Longevity Relatives Count (LRC) score and we evaluated their performance dealing with differential family size. We proposed a new longevity family score, the mLRC score, an extension of the LRC based on random effects modeling, which is robust for family size and missing values. The performance of the new mLRC as selection tool was evaluated in an intensive simulation study and illustrated in a large real dataset, the Historical Sample of the Netherlands (HSN). Results Empirical scores such as the FLOSS and LRC cannot properly deal with differential family size and missing data. Our simulation study showed that mLRC is not affected by family size and provides more accurate selections of long-lived families. The analysis of 1105 sibships of the Historical Sample of the Netherlands showed that the selection of long-lived individuals based on the mLRC score predicts excess survival in the validation set better than the selection based on the LRC score . Conclusions Model-based score systems such as the mLRC score help to reduce heterogeneity in the selection of long-lived families. The power of future studies into the genetics of longevity can likely be improved and their bias reduced, by selecting long-lived cases using the mLRC.


Background
There is strong evidence that longevity, defined as survival to extreme ages, clusters within families and is transmitted across generations [1][2][3][4][5][6][7]. Recent research [5] on two large population-based multi-generational family studies indicates that longevity is transmitted as a quantitative genetic trait. Moreover, associations between environmental factors and familial clustering have been rarely found using historical pedigree data [5,[8][9][10]. Although these findings suggest that human longevity has a genetic component, genetic studies on longevity have had limited success in identifying longevity loci [11][12][13][14][15][16][17]. One of the main causes for this limited success could be the large heterogeneity in criteria for participant selection in longevity studies [5,18,19]. Since the study participants must be alive to extract blood or other biomaterials their longevity phenotype is, by definition, unknown. An additional complication of longevity studies is the ongoing increase in life expectancy due to nongenetic factors [20], such as improvements in nutrition, life style and health care. If only individual age is considered as selection criterion, these non-genetic factors increase the risk of including sporadically long-lived individuals i.e. individuals with the longevity phenotype but who do not have an underlying genetic predisposition for longevity.
To obtain a sample with less phenotype heterogeneity, the family history of longevity can be used as a participant selection criterion [5,18]. Although this approach does not avoid that sample selection is influenced by family-shared non-genetic factors potentially involved in longevity, it is likely that it increases the power in case-control studies to detect novel genetic loci [21,22]. A natural way to incorporate family history in the study design is to develop a longevity family score to identify families with the heritable longevity trait and to subsequently select alive members of these families for (genetic) longevity studies. A number of longevity family scores have been previously proposed [4,18,[23][24][25], using different definitions of individual longevity and different ways of summarizing longevity within families. The implications of these choices are not well understood, namely how the interplay among individual longevity definition, family-specific summary measures and family size affects the sample selection process based on longevity family scores. The first challenge when developing longevity family scores is defining individual longevity. It is unclear how extreme the age at death must be to label an individual as long-lived and which scale is most beneficial so that scores reflect differences in extreme survival and not just in overall lifespan. The second challenge when developing longevity family scores are the large differences in family size. These differences imply that the available information per family differs. For a family with 12 members, for instance, more information is available than for a family with 2 members only. Importantly, we typically do not know whether these differences are real differences in sibship sizes or the result of missing data caused by limitations of the data collection. If not properly addressed, differences in family size can lead to biased rankings of long-lived families. This can lead to an increased heterogeneity among selected participants in longevity studies and hence reduce power of analyses. Instead of studying the genetics of longevity, biased selections can potentially lead to the combined analysis of the genetics of longevity, fertility and other factors affecting family size, such as, for example, socio economic status. Up till now, this important challenge has not received enough attention and how to address this problem still remains open.
In this paper, we investigate to what extent existing longevity family scores such as the Family Longevity Selection Score (FLoSS) [23] and the Longevity Relatives Count (LRC) score [18], are affected by differential family size. Subsequently, we propose an alternative method based on mixed effects regression modelling to deal with differences in family size when building a longevity family score.
The main novelty of our new approach is to consider the family size as a source of uncertainty when estimating the level of longevity of a family. Hence, we propose to select families accounting for such estimated uncertainty. This new approach will contribute to more robust scores and selection rules in longevity studies.

Existing longevity family scores and family size
Several longevity or excess survival family scores have been previously proposed [4,18,[23][24][25]. Often, to measure individual survival exceptionality, age at death is transformed to the corresponding survival percentile [18] or related measure such as the cumulative hazard [4,23,25] using life table data of a reference population, typically matching for sex and birth cohort. An alternative approach based on defining individual survival exceptionality as the difference between individual's age at death and the sample-based expected age at death correcting for a number of confounders has been also proposed [24].
We focus on two of the previous proposals, representative of two different ways of summarizing individual survival exceptionality within families: the Family Longevity Selection Score (FLoSS) [23] and the Longevity Relatives Count (LRC) score [18]. The FLoSS relies on a sum to summarize survival exceptional within families, while the LRC score is representative of the rest of previously proposed longevity scores which all rely on an empirical expectation as summary, i.e., the mean [4,24,25] or a proportion [18] depending on the nature of the individual measure of survival exceptionality. These two type of summary measures (sum versus empirical expectation) have different implications with regard to the influence of family size in the resulting scoring system.

The FLoSS favors large families
The Family Longevity Selection Score (FLoSS) [23] was constructed using siblings included in the Long Life Family Study. The FLoSS is a modification of the SE f score which adds a bonus for the presence of living family members. Since the main properties of SE f transfer to FLoSS, for the sake of simplicity we focus on the properties of the SE f , defined, for each family i, as follows: where t ij is the age at death of family member j of family i, with j = 1,…,N i members, S(t ij | bc ij , sex ij ) is the survival probability at age t ij given sex and birth cohort in the reference population and Λ(t ij | bc ij , sex ij ) is the corresponding cumulative hazard. SE ij varies between − 1 (if S(t ij | bc ij , sex ij ) = 1) and ∞ (if S(t ij | bc ij , sex ij ) = 0). The maximum value of SE ij is determined by the maximum age recorded in the used life table. If for example, this maximum age at death is 99, like in the Dutch life tables [26], and the minimum survival in the population is S(99| bc ij , sex ij ) = 0.01, this provides a maximum SE ij = 4.6. The reference value, corresponding to a value SE ij = 0 corresponds to S(t ij | bc ij , sex ij ) = 0.37. This means that family members with age at death beyond the top 37% survivors count positively in the score and those with younger ages at death count negatively. For example, using the Dutch life tables, this cut-off would correspond, for those born around 1900 with an age of death of around 73 years for men and of around 80 for women. This thresholds are not in line with recent evidence indicating that higher ages at death need to be considered to capture the heritable longevity trait [5,18]. This problem can be solved by conditioning survival to being alive at certain age. For example, a conditioning age of 40 years has previously been proposed [23], which increases the age cut-off associated to SE ij = 0. For example, using Dutch lifetables this would correspond to a cut-off of around 84 years for women and 78 year for men for individuals born around 1900. These ages correspond with percentiles survivals at birth of around 0.28 (oldest 28% survivors of their birth cohort) which are likely not extreme enough to capture the heritable longevity trait. This drawback is somehow compensated by the strongly skewed distribution of SE ij , meaning that the impact of increasing, for example, from 95 to 96 years is greater than the increase from 70 to 71. An additional problem of the SE f score is that it uses the sum over the available family members to summarize the level of survival exceptionality within the family. This implies that large families are systematically overweighted when using SE f . This phenomenon is illustrated in Fig. 1. Three example populations with twenty sibships each and different level of enrichment for longevity are considered. In the three examples, we consider sibships of increasing size, N i = i + 1, i = 1,2,...,20. In the first example population, all sibships have two siblings belonging to the top 5% survivors of their sex-specific birth cohort and the rest of siblings belonging to the top 30% survivors, so these family members are clearly not longlived. In the second, all sibships have two siblings belonging to the top 10% survivors of their sex-specific birth cohort and the rest of siblings belonging to the top 30% survivors. In the third example population all siblings belong to the top 30% survivors, representing a population with no long-lived individuals. The left panel of Fig. 1 illustrates the performance of the score SE f in these three examples. Overall, increasing the sibship size leads to larger values of SE f . Moreover, larger families with lower proportions of longlived members can present a larger value of SE f than small families with a larger proportion of long-lived members. For example, a family with two members belonging to the top 10 survivors and 8 extra not long-lived siblings has a larger SE f than a family with two members in the top 10 survivors and 5 extra not long-lived siblings (black line). It can also happen that a large family where two siblings are top 10% survivors and the rest not long-lived present a larger SE f than a smaller family where two siblings are top 5% and the rest are not long-lived. The increasing pink line corresponding to the third scenario illustrates that large families with no long-lived family members can present large values of SE f , with SE f arbitrarily increasing in parallel to family size.
In summary, using SE f and FLoSS in the selection of long-lived families may lead to an overrepresentation of large families and hence undesirable heterogeneity in the selected sample of families. Importantly, the size of the families governs the range of variation of the family score implying that SE f and FLoSS are not comparable when calculated in populations with different underlying family size patterns. Since this is an highly undesired feature, we will not further focus on the SE f score (and FLoSS) in the rest of the paper.

The LRC score favors small families
To mitigate the previously explained bias towards large families, a solution is to use a different summary measure at the family level, like the average [4,25].
In this line, and based on the results of a recent study which shows that longevity is heritable beyond the 10% survivors of their birth cohort [5], the Longevity Relatives Count (LRC) score has been proposed [18]. The original definition of the LRC score allows for the inclusion of family members with different degree of relatedness. Here, we focus on its simplest form considering only siblings in its construction: where P ij is the sex and birth cohort specific percentile survival of individual j of family i, i.e., P ij = 1 − S(t ij | bc ij , sex ij ). I(P ij ≥ 0.9) is a variable indicator taking value 1 if individual j belongs to the top 10 survivor of his/her sex-specific birth cohort and 0 otherwise. As a result, LRC i is the proportion of members of family i belonging to the group of top 10 survivors, defined as long-lived. The LRC is bounded between 0 and 1, providing a clear interpretation and comparability across populations. A drawback is that it is based on a binary definition of longevity, ignoring differences in longevity beyond the top 10% of survivors.
The LRC score is based on calculating a proportion, and as a consequence, the resulting ranking based on this score indirectly favors small families. For small families, it is more easy to have 100% of its family members in the top 10% survivors for than large families. Hence, in small families it can be questioned whether a large LRC truly captures the heritable longevity trait.
The problem of this approach is of different nature than the case of the SE f score. While adding not longlived family members implies an increase in SE f , this is not the case for LRC (Fig. 1, right panel). Instead of a systematic bias, we now face a problem of different uncertainty levels depending of the size of the family which cannot be properly captured by an empirical proportion. Consider the following example for illustration. Two families, both with half of the siblings long-lived, but in the first case the sibship size was 2 and on the second case the sibship size was 10. It is clear that there is more information in the second case and hence the ranking should also take this into account. However, using empirical proportions small families are benefitted.

Accounting for uncertainty in longevity family scores
To deal with the heterogeneity in information between families caused by their size, we propose to use mixed effects regression modelling in the estimation of family scores. In particular, we focus on the LRC, and extend its concept by introducing family specific random effects.
Let Y ij = I(P ij ≥ c) be a binary random variable that indicates if P ij is equal of larger than c, where P ij is the percentile survival of individual j of family i, and c is a pre-specified threshold of longevity. For example, c = 0.90. Let u i be a random effect shared by the members of the same family that reflects the unobserved factors contributing to longevity.
Assuming that Y ij follows a Bernoulli distribution, the family specific probability to reach c is given by the following logistic regression model with random intercept: We assume that u i follows a normal distribution with mean zero and variance σ 2 . Then, the parameters β 0 and σ 2 can be estimated maximizing the resulting likelihood where N is the total number of families, N i is the number of family members of family i and f is the density function of u i . Maximization of the likelihood cannot be analytically solved and requires numerical approximation techniques (e.g. quadrature methods). Finally, we can obtainp i , the expected value of p i given the observed data of family i and the estimated β 0 and σ, denoted byβ o andσ, aŝ where f ðujy i1 ; …; y iN i ;β 0 ;σÞ is the density of the posterior distribution of the family specific random effect. Using Bayes' rule, this density can be obtained as We propose to considerp i as a new longevity family score of family i, and we denote it by mLRC i . In this way, mLRC can be regarded as a model-based version of LRC which includes shrinkage based on N i . mLRC i can still be interpreted as the proportion of long-lived members of family i but it captures the uncertainty due to family size by the different 'weight' each family receives through its estimated random effectû i .

Software implementation
The new mLRC family score, together with the LRC and FLoSS have been implemented in R. The code is provided as supplementary material.

Simulation study
Simulated data is generated under the assumption that a latent factor, shared by the members of the same family, controls the degree of longevity of the family. Based on the simulated data, we can measure the level of agreement between the underlying longevity factor and different longevity family scores.
Characteristics of the simulated datasets such as the number and size of the families are chosen to mimic our real data set. In each run of the simulation, we simulated For each family, we computed the LRC score and the new model-based LRC (mLRC). Both scores were compared in terms of their relation with family size and performance as selection tools. The simulation was repeated 1000 times. Table 1 shows the distribution of family size according to the values of LRC and mLRC. The LRC score is strongly affected by family size; families with low sibship In each of the 1000 simulation runs, LRC and mLRC were categorized in 6 groups (using 0.1,0.2,0.3,0.4 and 0.5 as cut-offs) and median family size in each group was calculated. As a summary over the 1000 simulation runs, we provide median and range (in brackets) of these values. The left column reports results based on LRC and the right column reports results based on mLRC sizes tend to have large values of LRC (left column of Table 1). No clear relation between family size and mLRC is observed (right column of Table 1), which is in agreement with the data generation mechanism. Figure 2 shows the comparison between the LRC and mLRC for all the families in one simulation run. For small families, the mLRC score is typically lower than the LRC score when the LRC score is large. This is caused by the penalization of our new method due to lack of information in small families. Analogously, small families are weighted upwards when the LRC score is low following the same principle of major uncertainty when the family size is small. Still, if the level of exceptionality of the observed family members is large, small families can still outperform large families. This is illustrated by small families (for example, with N i = 2, red dots) appearing at the right part of the graphic in Fig. 2. The ability of mLRC to correctly deal with differences in family size, explains that the association between family size and the mLRC score is very low (right column Table 1).
To evaluate the performance of selection rules based on the LRC and mLRC scores, we considered two definitions of longevity. First, the 10% of families with the lowest value of the random effect u were defined as truly long-lived. Second, we considered the 5% of families with the lowest value of the random effect u as truly long-lived. For both definitions, we checked the agreement between the truly long-lived families and the selected families based on the LRC and mLRC scores. To perform this selection, the families with the 10% (respectively 5%) largest LRC or mLRC score were labeled as long-lived. Since our main interest was to avoid families not enriched for longevity in our selection, we used the positive predictive value (PPV) as summary measure of our simulations. The PPV is defined as the proportion of truly long-lived families among those classified as long-lived using the score-based selection rule under investigation. Figure 3 shows the distribution of the positive predictive values from the 1000 simulation runs. When defining the 10% of families with the lowest value of the random effect u as truly long-lived (left panel of Fig. 3), the mean PPV for the selection based on LRC was 54% (sd = 4%), meaning that on average, among the 1000 top 10% families classified as long-lived according to LRC, 54% were truly long-lived. The mean PPV increased to 62% (sd = 4%) when using mLRC for selection of the top 10% families. If we focus on the top 5% families (right panel of Fig. 3), the average accuracy of the selection based on LRC decreased (mean PPV = 0.52,sd = 0.13). In addition, we found large variability of the PPV among simulation runs, which indicates instable performance of the LRC

Real data: the historical sample of the Netherlands
The Historical Sample of the Netherlands (HSN) Long Lives study [27,28] is an extensive database which contains lifetime data for the members of 1326 fivegenerational families, evolving around a single proband (Index Person, IP) per family [29]. We focus on the siblings present in the second (F2) generation which are the children of the IPs. The selection for a part of these IPs was enriched for longevity. Specifically, the selected IPs were part of a case-control study to compare differences in longevity among descendants of 884 IPs who died at 80 years or beyond (case group) and 442 IPs who died between 40 and 59 years (control group) [18,30]. After removing individuals with missing age at death, single child sibships, and individuals belonging to nonextinct birth cohorts by the date of data collection (death dates were updated at 2017 and 110 years was set as maximum age); the final sample of our analysis consisted of 1105 sibships, children of the aforementioned HSN IPs, which corresponded to 5361 individuals.
To evaluate the performance of the new longevity family score mLRC and compare it to the original LRC, we first randomly selected a sample of independent individuals by choosing one individual at random from each of the 1105 available sibships. This set of independent individuals was set aside from the score calculations and subsequently used as a validation set to evaluate score performance. This validation set resembles the potential candidates to be included in, for example, a GWA study of longevity. Then, LRC and mLRC were calculated based on a sample of 4256 individuals. Afterwards, based on both scores we conducted a selection of long-lived families and we checked if those corresponded with a survival benefit in the validation set using Cox proportional hazard regression.
The sibship size was highly varying in the sample (Fig. 4). As expected, LRC is largely affected by family size, and families with large values of LRC present lower sibship sizes (Table 2). We do not observe a pattern in family size according to the estimated level of familiar longevity using mLRC. Figure 5 shows the distribution of the LRC and mLRC scores in the analyzed sibships of the HSN dataset.
Previous literature [18], has suggested LRC ≥ 0.3 as a selection criterion to capture the heritable longevity trait. In our sample, LRC ≥ 0.3 corresponds to the selection of the 15% families with the largest values of the LRC score. We evaluated the performance of this selection criterion by comparing the survival of the individuals of the validation set belonging the selected families to the rest of individuals in the validation set. Analogously, we selected the top 15% families according to ranking resulting from using the mLRC as longevity  Table 3 shows that the selection of long-lived individuals based on the mLRC score predicts excess survival in the validation set better than the selection based on the LRC score (β LRC ≥ 0.3 = − 0.287, β mLRC ≥ 0.15 = − 0.321).

Discussion
We proposed a method based on mixed effects regression modelling to estimate longevity family scores and properly account for differences in family size when ranking families according to their longevity and use this ranking for the selection of participants in longevity studies. Our simulation study and real data analysis show that the new proposed approach (mLRC) yields better results than its empirical counterpart (LRC) in terms of selection of long-lived individuals. We showed that the SE f score and FLoSS increase with the addition of non-long-lived family members and their interpretation is ruled by the underlying family size distribution. We also showed that the LRC score puts too much weight on small, less-informative families. The mLRC score was not affected by sibship size and therefore its resulting ranking better predicted the survival of 1105 independent study participants. The new mLRC score seems to reduce heterogeneity in the selection of families and its application could potentially help to improve power and bias reduction in longevity studies.
Our current approach has some limitations. First, the binary nature of the current mLRC discards important information which could contribute to improve its performance. An interesting property of the SE f score and   [4,24,25]. The Longevity Family Score (LFS) [4] and the Family Mortality History Score (FMHS) [25] are closely related to the SE f and FLoSS since all use the same measure of individual survival exceptionality based on transforming the observed ages at death to survival percentiles in a reference population using life tables. The FMHS is restricted to parental data and hence not subject to differential family size. The LFS, the SE f and the FLoSS are extensions of the FMHS which can deal with sibships of arbitrary size. The Familial Excess Longevity (FEL) score [24] is also continuous but it does rely on population life tables. Instead, individual survival exceptionality is defined as the difference between observed and expected age, derived from an accelerated failure time regression model. Both the LFS and the FEL scores are based on the mean as family-specific summary measure and hence share with the LRC score the discussed limitations of empirical expectations. A potential drawback of all these continuous longevity scores is that relatively young family members can contribute positively to these scores. Even after conditioning on being older than 40 as proposed for the FLoSS, the resulting score is probably influenced by ages at death which are not extreme enough to capture the heritable longevity trait. Evidence of this is supported by studies that have pointed towards increasing family aggregation of survival when focusing on more extreme ages at death for longevity definition [13,31] and recent publications indicating that the longevity trait seems to be heritable considering lifespan thresholds beyond the top 10% survivors of a given birth cohort [5]. A model-based modified version of SE f or the LFS which minimizes the contribution of young family members seems a promising line of future research. However, the extremely skewed distribution of the individual measure of longevity of these scores makes the extension of our method not straightforward.
Another important topic is dealing with alive or lost on follow-up (right censored) individuals when constructing longevity family scores. We have assumed full observation of lifespan of siblings included in the calculation of the score, so scores can be regarded as family history scores of alive relatives who could potentially be selected to participate in a (genetic) longevity study.
The FLoSS score is the extension of the discussed score SE f to allow for the inclusion of right censored observations. The FLoSS follows a single imputation approach based on imputing alive individuals with the sex and birth cohort specific conditional expected age at death. This is an example of single imputation which underestimates the uncertainty of estimates and can potentially lead to bias. More advanced methods are possible in the mixed effect setting and its inclusion is left as subject of future research. Finally, recent evidence [9] indicates that the inclusion of family members of different  Long-lived families were defined as those belonging to the top 15% of each score which corresponded to a cut-off of 0.3 in LRC and a cut-off of 0.15 in mLRC. For each binary variable defined in these cut-offs, a multivariable Cox proportional hazard regression model corrected by birth cohort and gender is fitted in the validation set. Estimates of the resulting regression coefficient(β) and standard error (s.e.) are reported degree of relatedness is of great importance to capture the heritable longevity phenotype and hence the proposed method should also be extended to this more complex setting. Finally, it is important to mention that our approach may result in selections that are influenced by familyshared non-genetic factors. Despite previous research based on historical pedigree data have led to little evidence for associations between non-genetic factors such as socio-economic status, fertility factors or religious denomination and familial longevity [5,[8][9][10], other sociobehavioral and environmental factors such as personality and lifestyle may influence familial clustering of longevity. Since many of these also have a strong genetic component itself it is most likely that gene environmental interactions can explain a part of the familial clustering of longevity. Still in this complex setting, the use of welldesigned family scores is expected to reduce genetic heterogeneity and contribute to a power increase in casecontrol longevity studies to detect novel genetic loci. Moreover, our mLRC score can be applied in more general longevity studies devoted to investigate the interplay among genetic and non-genetic factors involved in longevity.

Conclusions
To properly account for differences in family size is of paramount importance when deriving family scores of longevity and using them for ranking families and selecting participants in longevity studies. The methodology described in this paper is therefore of great relevance and can help to improve selection of participants in future longevity studies.