Statistical methodology for age-adjustment of the GH-2000 score detecting growth hormone misuse

Background The GH-2000 score has been developed as a powerful and unique technique for the detection of growth hormone misuse by sportsmen and women. The score depends upon the measurement of two growth hormone (GH) sensitive markers, insulin-like growth factor-I (IGF-I) and the amino-terminal pro-peptide of type III collagen (P-III-NP). With the collection and establishment of an increasingly large database it has become apparent that the score shows a positive age effect in the male athlete population, which could potentially place older male athletes at a disadvantage. Methods We have used results from residual analysis of the general linear model to show that the residual of the GH-2000 score when regressed on the mean-age centred age is an appropriate way to proceed to correct this bias. As six GH-2000 scores are possible depending on the assays used for determining IGF-I and P-III-NP, methodology had to be explored for including six different age effects into a unique residual. Meta-analytic techniques have been utilized to find a summary age effect. Results The age-adjusted GH-2000 score, a form of residual, has similar mean and variance as the original GH-2000 score and, hence, the developed decision limits show negligible change when compared to the decision limits based on the original score. We also show that any further scale-transformation will not change the adjusted score. Hence the suggested adjustment is optimal for the given data. The summary age effect is homogeneous across the six scores, and so the generic adjustment of the GH-2000 score formula is justified. Conclusions A final revised GH-2000 score formula is provided which is independent of the age of the athlete under consideration.


Background
Growth hormone is a powerful anabolic agent of considerable therapeutic value but also misused in sport for its anabolic and lipolytic properties [1]. In order to preserve the fairness of competition, its use is prohibited by the World Anti-Doping Agency [2] and there is a need for methods to detect its misuse. Two methods are presently available and approved by the World Anti-Doping Agency (WADA); the isoform test developed by Bidlingmaier et al. [3]) (see also [4]) and the GH-2000 biomarker test developed by the GH-2000 and GH-2004 projects [5]. The latter method depends upon the measurement of two growth hormone (GH) sensitive markers, insulin-like growth factor-I (IGF-I) and the amino-terminal pro-peptide of type III collagen (P-III-NP), both of which rise in response to exogenous GH administration [6,7]. The measured concentrations of the biomarkers are combined in sex-specific and age-adjusted discriminant functions, which allow for the calculation of a score (the GH-2000 score) on which basis the compliance of the sample's analytical result is determined. The age correction is required because GH secretion and markers of its action rise during childhood and reach a peak in early adulthood before declining at a rate of~14 % per decade [8]. Without an adjustment for age, younger athletes are placed at a disadvantage. For IGF-I and P-III-NP, a model in which the log of the marker level decreased linearly with the reciprocal of age fitted the data from 693 elite athlete marker levels well, over the range of ages studied [9] and a term with the reciprocal for age was included in the GH-2000 score [10]. The inverse term for age is designed to adjust for age so that the score becomes independent of age. This is important in order to make the test applicable to athletes of all ages.
The initial development of the GH-2000 score was based on immunoassays that are no longer commercially available. Although the original discriminant function has remained unchanged, the decision limits have been updated as further experience was accumulated and new assays became available [5,11]. Currently, there are three IGF-I assays and two P-III-NP assays approved by WADA.
For more details and background on these assays see Holt et al. [5].
As these assays do not give identical results, different GH-2000 scores are obtained with each of the combinations and this means that the decision limits are different, depending on the assay pair used.
Recent analysis of a combined database of 998 male and 931 female elite athletes [5] provides evidence that the score is independent of age for the female population whereas it shows a linear dependence for male athletes. This indicates that the original inverse term for age overcorrects for the natural decline in GH markers thereby potentially placing older athletes at a disadvantage.
The combined database contains blood samples of athletes collected at various sporting events including the 2011 International Association of Athletics Federations (IAAF) World Athletics Championships in Daegu, South Korea, in the following abbreviated as the Daegu sample. Figure 1 shows the scores and their relationship to age in 597 male athletes competing in Daegu. There are 6 scores as there are 3 assays for IGF-I (LC-MS/MS, Immunotech, IDS) and 2 for P-IIIN-P (Siemens-Centaur, Orion). It is clear from Fig. 1 that in all GH-2000 scores there is a positive age dependency as all linear regression lines show a significant age-effect. This positive age dependency is also seen in nonparametric regression of the GH-2000 score on age and, hence, is of structural nature and not caused by artefacts such as outlying observations. There is no age effect on the GH-2000 scores for the female population of the Daegu sample indicating that the original age correction term performs well in a new independent database (data not shown).
The purpose of this paper is to suggest and discuss statistical methodology for adjusting the existing male GH-2000 score for the undesirable age-effect.

The basics of adjustment
Consider a response Y (in our case the GH-2000 score) and an effect x (in our case the age of an athlete). Suppose that the response Y is related to x by a linear regression model Then, the least-squares estimate of β in (4) is given bŷ where the pairs (Y i , x i ) represent the n sample values of Y and x. On this basis we are able to construct a re- The adjusted response Y * is independent of x as the following analysis shows. This can be found in most books on regression but it is mentioned here for completeness. Consider the least-squares-estimate of β * in (6) This least-squares estimate of β * is provided as zero as equation (7) shows: Hence Y * is independent of x. A more general result is provided in Appendix 1.
Next, we suggest considering an adjustment of the form The benefit of this adjustment (8) lies in the fact that the adjusted score Y * remains on the same level as the original score Y as The process of considering x−x is called centering. Sometimes also norming is considered in addition to centering We are not considering norming here as this will not lead to any further adjustment. To see this, we consider any scale transformation ax of x. The original model E (Y) = α + βx becomes now E (Y) = α * + β * x * , where x * = ax. Then, least squares estimates can be found as Hence the adjusted response (11) is indeed identical to the original adjustment Y −βx and does not lead to anything new. A more general result is provided in Appendix 2. Hence we stay with the adjust- (8), as the final form of adjustment.

Adjusting the GH-2000 score
To adjust the GH-2000 score, we consider the regression of the GH-2000 score on age. Table 1 shows 6 age-effects for the 6 GH-2000 scores (as there are 2 assays for measuring P-III-NP and 3 assays for measuring IGF-I).
For simplicity and ease of use by the anti-doping laboratories, it is important that we do not create an age adjustment for each assay pairing. Thus we need to include the age adjustment within the generic GH-2000 score (independent of the specific assay pairing used). To accomplish this task we have applied ideas from meta-analysis. We consider each GH-2000 score using a specific assay combination as a realisation from multiple possible assay combinations. This is similar to a meta-analysis approach in which studies aiming to estimate a certain effect are considered as realisation from a universe of possible studies.
Hence we use where k = 6 is the number of different assay combinations used andβ i is the estimated age effect, and w i is the inverse of the estimated variance (the squared values in column 3 of Table 1). Hence β is an average of the estimated effect.

Results
In our case, we find β = 0.032. Figure 2 shows this analysis graphically. As all assay-specific age effects are similar in their standard error, all weights are similar. More details on the meta-analysis approach are given in Appendix 3.
To investigate the appropriateness of the meta-analytic weighted average approach (are the age-effects for the six scores similar enough to be validly combined in a weighted average?) a heterogeneity analysis was performed. The X 2 -test of homogeneity χ 2 ¼ ð13Þ Figure 3 shows a scatterplot of the six age-adjusted GH-2000-scores. It clearly shows that the age-effect is removed as it is expected from the above theory.

Effect on the current WADA decision limits
Although this adjustment will lead to changes in the individual GH-2000 score of an athlete, it has negligible effect on the decision limits. The decision limits are most important in practice as they provide the cut-off value above which the athlete's GH-2000 score value is considered to be positive. Following Holt et al. [5] these are constructed using the 1 in 10,000 false positive rate as  . 2 Meta-analytic results for the six age-effects of the GH-2000 scores on age (I-V stands for overall inversely weighted and provides the summary estimate of the age-affect); more details are given in the appendix, the arrow-to-right indicates that the right confidence limit falls outside the plotting area where y and s are mean and standard deviation of the respective GH-2000 score. u is a sample uncertainly term defined as where n is the sample size. Table 2 shows the details, in particular, a comparison between GH-2000 scores with and without adjustment

Distribution of adjusted GH-2000 scores
The construction of the decision limits for GH-2000 biomarker methodology is dependent on a normal distribution of GH-2000 scores among clean athlete. This was assessed using probability plotting and the Anderson-Darling test for normality which provided clear evidence that all six scores were normally distributed (Fig. 4).  Fig. 4 Probability plots for the six GH-2000 scores (GHS) adjusted for age; AD stand for Anderson-Darling test of normality and the P-value refers to the null-hypothesis of normality so that values larger than 0.05 do not lead to rejection of normality

Discussion
We are suggesting this adjustment for the male elite athlete population only, as the female population does not show age dependency. It could be demonstrated that the proposed adjustment of the GH-2000 score removes the positive age dependency. Furthermore, the age-adjustment of the score is also beneficial with respect to the normality of the scores as the probability plot in Fig. 4 shows that all scores appear to be normal.
The GH-2000 and GH-2004 teams have previously published the rationale and background to the development of decision limits for the GH-2000 biomarker detection method [5,10].
It was always envisaged that a dynamic approach would be taken towards refining the decision limits as further data became available. Our recent investigations have shown that the age-adjustment in the male discriminant function, which was derived the original GH-2000 crosssectional elite athlete study [9,10], over-corrects for age in male athletes in our more recent cohorts. The effect of this over-correction is to place older male athletes at a slight disadvantage compared with their younger peers, for whom the sensitivity of the test is reduced. The original age correction for women remained valid in the later cohorts. We have used the most recent dataset, on which the current decision limits are based, to add a smaller further adjustment to the discriminant function to address this issue.
When undertaking this analysis, we used several principles to guide out work: 1) we wanted to ensure that the updated male discriminant function was unaffected by age in order to make the test equally fair and effective for athletes of all ages; 2) the change in age correction would have a minimal effect on the current decision limits; and 3) a single age adjustment could be applied for all assay pairings. In order to minimise the effect on the current decision limits, we used a method that centred the data. By doing so the mean GH-2000 scores were virtually unaffected. There was a trivial change to the SDs and consequently the decision limits, which are based on the mean and SD, were unchanged. The age adjustment varies slightly by assay pairing and in order to overcome this, we adapted meta-analytical methodology to derive a common age adjustment for all the combinations. There was no evidence of heterogeneity between the assay pairings and each contributed to the final adjustment equally, providing support for this approach.

Conclusion
In conclusion, we have created a small further age adjustment for male athletes to correct the age bias introduced with the original discriminant formula. This has no effect on the decision limits and should be easily introduced into anti-doping testing.
matrix. See also [12]. As a consequence norming (for example by standard deviations of covariates) of the covariates will not change the residuals.