Statistical adjustment of genotyping error in a case–control study of childhood leukaemia

Background Genotyping has become more cost-effective and less invasive with the use of buccal cell sampling. However, low or fragmented DNA yields from buccal cells collected using FTA cards often requires additional whole genome amplification to produce sufficient DNA for genotyping. In our case–control study of childhood leukaemia, discordance was found between genotypes derived from blood and whole genome amplified FTA buccal DNA samples. We aimed to develop a user-friendly method to correct for this genotype misclassification, as existing methods were not suitable for use in our study. Methods Discordance between the results of blood and buccal-derived DNA was assessed in childhood leukaemia cases who had both blood and FTA buccal samples. A method based on applying misclassification probabilities to measured data and combining results using multiple imputations, was devised to correct for error in the genotypes of control subjects, for whom only buccal samples were available, to minimize bias in the odds ratios in the case–control analysis. Results Application of the correction method to synthetic datasets showed it was effective in producing correct odds ratios from data with known misclassification. Moreover, when applied to each of six bi-allelic loci, correction altered the odds ratios in the logically anticipated manner given the degree and direction of the misclassification revealed by the investigations in cases. The precision of the effect estimates decreased with decreasing size of the misclassification data set. Conclusions Bias arising from differential genotype misclassification can be reduced by correcting results using this method whenever data on concordance of genotyping results with those from a different and probably better DNA source are available.


Background
The Australian Study of causes of Acute Lymphoblastic Leukaemia in Children (Aus-ALL) was a populationbased case-control study, conducted between 2003 and 2007, designed to investigate environmental and genetic risk factors for childhood ALL as well as their interactions. The study has been described in detail elsewhere [1]. Briefly, 415 case children provided blood samples for genetic analysis during a routine visit to the treating hospital after initial remission was achieved. Buccal cell samples were collected from 536 control children at home using Whatman FTA Indicating Micro Cards (Cat. No.WB120211) (hereafter referred to as 'FTA cards') to maximize participation and minimize costs [2,3]. Whole genome amplification (WGA) from 1.2 mm diameter trephine punched discs were used to increase and preserve the finite amount of DNA available from the FTA card samples. Case children were also asked to provide a buccal sample using an FTA card so that concordance between genotypes measured using blood and FTA card buccal samples could be examined.
In this paper, we briefly present the concordance results for a set of six polymorphisms and describe the performance of a method we developed to correct genetic association analyses for the unexpected misclassification we observed. Because of the differential nature of the misclassification (all controls provided buccal DNA while most cases provided a blood sample) and the large number of covariates required and analyses planned, particularly all those requiring more than one gene in a model or examination of gene-environment or gene-gene interactions, we needed a comparatively simple and easy-to-use method for addressing this issue. While there is a considerable literature on methods for correction of measurement error or misclassification (for example, Guolo 2008 [4], and Thurigen et al., 2000 [5]), there appeared to be none that met our requirements with easily useable software. For example, the seismic program from Fox et al. [6] only handled one binary misclassified variable, and the Mime approach of Cole et al. [7] handled only one binary misclassified variable and required some of the validation sample to also include controls. The more flexible MC-SIMEX method of Kuchenhoff et al. [8], available in the R-library, was also not considered to be suitable, as the corrected results took no account of the size of the validation sample and was highly dependent on the choice of the extrapolation function. Clayton et al. have described an ingenious method for adjusting for a similar problem of differential bias [9], which uses statistical properties of the observed data and requires no validation data; however, it requires large samples so could not be used in our study. Thus it was necessary to develop a novel correction method, as described below.

Laboratory methods
The blood samples from case children were couriered to the processing laboratory overnight following collection. Whole blood was refrigerated at 4°C for a maximum of 7 days prior to DNA extraction using the Wizard Genomic DNA Purification Kit (Promega, Madison WI cat #1620) in accordance with the manufacturer's instructions. DNA concentrations were quantified using a ND-1000 spectrophotometer (Nanodrop Technologies, Wilmington DE). FTA cards were stored at room temperature with desiccant until processing. Four discs of 1.2 mm diameter were trephine punched from the sampling area and placed collectively into a single tube and amplified using the GenomiPhi DNA Amplification Kit (GE Healthcare, Buckinghamshire UK). Briefly, 9μL of sample buffer, 9μL of reaction buffer and 1μL of enzyme mix were added to the punches with thermo cycling conditions according to the manufacturer's instructions. DNA concentrations were quantified using a ND-3300 fluorospectrometer. The quantified DNA aliquots were then frozen at minus 30 degrees and thawed when required for genotyping.
Genotyping for single nucleotide polymorphisms (SNPs) involved in folate-metabolism, xenobiotic biotransformation and DNA repair pathways was performed using either restriction fragment length polymorphism analysis or TaqMan W SNP Genotyping Assays (Applied Bios stems, Foster City USA) with allele calling was performed by two independent researchers. For quality assurance purposes, 10% of samples were selected at random for repeat analysis, performed by laboratory staff blinded to sample identity. The specific SNPs genotyped in Aus-ALL are not identified in this methodological paper, as they are used for illustration purposes only. Full analyses of their associations with childhood ALL will be published elsewhere.

Genotyping performance All Aus-ALL SNPs
Across all 26 genotyped SNPs, the overall concordance within runs (repeat analyses of same sample) was 99.7% for blood samples (genomic DNA, hereafter referred to as gDNA) and 99.6% for WGA'd buccal DNA from the FTA cards (hereafter referred to as wgaDNA). The overall genotype failure rates for these SNPs were 0.16% and 1.68% for the gDNA and wgaDNA genotyping respectively. There were no departures from Hardy-Weinberg predictions observed in either the gDNA or wgaDNA genotype results. Among pairs of gDNA and wgaDNA samples collected from 249 case children, the average genotype discordance across all 26 diallelic SNPs (excluding discordance due to genotyping failure of one of the samples) was 3.48%, with a range of 0.40% to 12.96%. This discordance was the reason we developed the adjustment method described in this paper.
It appeared that the most likely source of error was preferential amplification from one chromosome of a pair, leading to loss of heterozygosis or "allelic drop-out" during WGA of frozen-thawed buccal DNA samples extracted from the FTA cards. Only WGA'd buccal DNA was available for control subjects; thus allelic drop-out in control subjects' genotypes could bias associations between genotypes (and genotype-exposure interactions) and risk of disease. A modified unconditional logistic regression modeling technique was therefore developed to correct for this bias.
We assumed the genotype from the gDNA sample was the 'true' genotype (gold standard). 92.2% of the genotyping error in the wgaDNA samples involved a change from heterozygous in the gDNA to homozygous wild type (Aa!AA) or homozygous mutant in the wgaDNA (Aa!aa) ( Table 1). Only 7.8% of discordant pairs involved a gDNA homozygous result (AA or aa) and a wgaDNA heterozygous result (Aa).

Algorithm testing subset
A subset of 6 SNPs from the Aus-ALL study was selected for use in the testing of the devised algorithm. These SNPs were selected as they covered the full spectrum of gDNA-wgaDNA genotype discordance seen. The mean level of discordance (excluding discordance due to genotyping failure of one of the samples) for these SNPs was 5.3% (range 0.40% to 12.96%).

Statistical correction method
In brief, a discordance table is created by comparing the genotyping method that has error with a gold standard method. If discordance is present, then a proportion (calculated from the discordance table) of the records with genotypes containing error are randomly selected, their genotypes are corrected and a logistic regression model is computed. This process is repeated multiple times, ensuring a different set of records is randomly selected each time, and controlling for the fact that the proportion is an estimate. Results from the multiple iterations of the model are then compiled and imputed effect estimates are calculated. SNP A from Table 1 is referred to throughout the description of this algorithm as an example Stage 1: Establish discordance tables The proportions of discordant results for each SNP in the paired gDNA-wgaDNA samples are shown in Table 1. We calculated the proportions of homozygous wild type (AA), heterozygous (Aa) and homozygous mutant (aa) results from the wgaDNA samples that differed from the paired gDNA sample results, and used these to estimate the true misclassification proportion in Stage 2 (see below). Table 2 shows the number of wgaDNA samples within each genotype category that were misclassified (assuming the paired gDNA sample gave the true result). For example, for SNP A there was 1 (of 96) incorrect AA result (should have been Aa), 1 (of 91) incorrect Aa result (should have been AA) and 3 (of 26) incorrect aa results (should have been Aa).
Stage 2: Correct the genotypes for analysis The discordance tables prepared in Stage 1 are only an estimate, from one sample, of the true misclassification proportion p. Therefore, the proportion of records to be corrected can be sampled from a binomial distribution with probability of success parameter p, which is automatically calculated from the discordance table. To make this correction within control subjects, the genotyping results were treated as follows: 1. The SNP genotype variable was re-assigned to a temporary new genotype variable. 2. A single value (p i ) was sampled from a binomial distribution with probability of success parameter p (e.g. 1/96 for AA to Aa). 3. Using this sampled proportion, the number of required changes (c T ) was determined by multiplying  this proportion (p i ) by the total number of controls with the genotype (AA). 4. From the control subjects with genotype (AA), the calculated number (c T ) of subjects which required genotype reclassification were randomly sampled (equal likelihood of selection, without replacement). 5. In the temporary new genotype variable, those subjects' genotypes were then reclassified (from AA to Aa). 6. Steps 2 through 5 were repeated for each instance of misclassification (e.g. each non diagonal cell of the discordance table with a value other than 0) Where two changes were required within one genotype for a SNP (for example SNP B, where some Aa values require reclassification to AA while others require reclassification to aa), the following 6 steps replaced steps 2 through 6 in the above process.

2.
A single value (p i ) was sampled from a binomial distribution with probability of success parameter p (e.g. (2 + 1)/90 [2/90 for Aa to AA + 1/90 for Aa to aa]). 3. Using this sampled proportion, the total number of required changes (c T ) was determined by multiplying this proportion (p i ) by the total number of controls with genotype (Aa) 4. From the control subjects with genotype (Aa), the number (c T ) of subjects requiring genotype reclassification were randomly sampled (equal likelihood of selection, without replacement). 5. Then p 1 and p 2 were temporarily assigned as the proportions of c T that were required for each change.
Here p 1 = 2/(2 + 1) for the reclassification of Aa to AA and p 2 = 1/(2 + 1) for the reclassification of Aa to aa. 6. Temporary scalars c 1 and c 2 were then calculated (c 1 = p 1 *c T and c 2 = p 2 *c T ) 7. In the new genotype variable, c 1 and c 2 of the randomly selected subjects' genotypes from step 4 were reclassified (c 1 from Aa to AA and c 2 Aa to aa). 8. Steps 2 through 7 were repeated for each instance of this type of misclassification (e.g. where a single row or column of the misclassification table had more than one non diagonal cell with a value other than 0) Stage 3: Run model to generate 'corrected' odds ratios for the association between each SNP and ALL risk Correction of the misclassified genotypes, as outlined in Stage 2, was then done in each run of an unconditional logistic regression model that was repeated 50 times. Relevant covariates were included in the model. For each variable in the model, the mean ß coefficient from the 50 iterations was used as the best estimate of the true coefficient. These estimates were used to calculate Wald Test p-values [10] and to generate 95% confidence intervals (CI) using Rubin's estimate of variance for multiple imputation [7]. Odds ratios were obtained by exponentiation of the estimates.

Implementation
The algorithm outlined in this paper was implemented in the statistical package R. For the R code to implement the outlined method, see our institutes website [11] or Additional file 1. The functions that make up the implementation of this algorithm are written so that they can be applied to any type of problem with a dichotomous outcome (e.g case/control analysis), a two or three level categorical predictor variable (e.g. a genotype) and discordance data (e.g. from a subset of the records with data from a gold standard measure, as here, or from other reference sources). The functions also allow for the inclusion of covariates in the model.

Testing the efficacy of the algorithm
To test the efficacy of the correction method, we created a series of hypothetical datasets that reflected our complete study dataset. These datasets were based on the genotype frequencies of two artificial SNPs: 'SNP 1' with an approximate dominant risk model (ORs fixed at: AA = 1.0, Aa = 1.01, aa = 2.04) and 'SNP 2' with an approximate log additive risk model (ORs fixed at: AA = 1.0, Aa = 1.30, aa = 1.68). The proportional amounts of misclassification shown in Aus-ALL SNPs A and B (the latter representing the SNP with the most discordance) among gDNA-wgaDNA pairs ( Table 2) were then applied to artificial SNPs 1 and 2, resulting in six separate datasets (SNPs 1 and 2 with no error, with SNP A-type error, and with SNP B-type error). Odds ratios estimated from these data sets are shown in Table 3. We also simulated the effect of increasing the number of sample pairs available for assessing gDNA-wgaDNA misclassification. In this simulation, we used the SNP B-type misclassification figures seen in Table 2 and multiplied the observed numbers, in turn, by 0.5 and 5. Using our case-control data, we then estimated odds ratios, with and without correction, for the associations between each SNP in Table 1 and risk of ALL adjusting for study matching variables (child's age, sex, and state of residence) and ethnicity.

Results
As expected, the results of the uncorrected ('true') and corrected analyses were identical (Table 3) when no error was present in the hypothetical data for artificial SNPs 1 and 2. When error was present, the estimated ORs deviated from the 'true' result, and did so to a greater extent when SNP B-type error was applied. For both artificial SNPs 1 and 2, the correction procedure adjusted the estimates in the direction ofand close to the 'true' result. For example, the 'true' OR for the SNP 1 aa genotype was 2.04 (95% CI: 1.29, 3.21). With SNP A-type error introduced into the data, the uncorrected OR was 1.82 (95% CI: 1.17, 2.84) while the corrected OR was 2.11 (95% CI: 1.29, 3.45), close to the 'true' result. When SNP B-type error was introduced, the uncorrected odds ratio was even further from the 'true' estimate -1.49 (95% CI: 1.06, 2.09); however, the corrected odds ratio was 1.99 (95% CI: 1.21, 3.26), close to the 'true' result.
As seen in Table 4, the odds ratios remained very similar, but the confidence intervals were somewhat wider or narrower when we simulated the effect of decreasing or increasing (respectively) the number of sample pairs available for assessing gDNA-wgaDNA misclassification.

Iterations
To verify the optimal number of iterations provided, we used the highest level of discordance (SNP-B type error) on the artificial SNP 1, and completed 1000 repeats of 25, 50 and 100 iterations to examine the distribution of the resulting imputed OR for the aa genotype. 95% confidence intervals for the spread of this imputed OR were (1.96 -2.10), (2.01 -2.11) and (2.02 -2.09) respectively. We felt the precision gained in 50 iterations compared to 25 was of value, and that the extra precision gained in 100 iterations compared to 50 was insufficient to warrant the observed modest increase in computation time. It should be noted that increasing the number of iterations would further increase the precision of the estimate, with the only expense being in added computation time.

Correction of case-control study data
The uncorrected and corrected odds ratios and 95% confidence intervals for risk of ALL associated with each genotype are shown in Table 5.
Almost 40% of the discordance in the gDNA-wgaDNA sample pairs was in the Aa!AA direction ( Table 1). The effect of our correction was to reduce the odds ratio  for the Aa genotype in all six SNPs (Table 5). Over 50% of the discordance seen in the gDNA-wgaDNA sample pairs was in the Aa!aa direction (Table 1). Correction increased the ORs for aa genotypes in four out of six SNPs, while the other two remained the same (Table 5).

Discussion
We aimed to correct for bias introduced by differential error in the genotyping results for control subjects compared with case subjects in our case-control study of childhood ALL. Over 90% of discordance observed in the case gDNA-wgaDNA sample pairs involved a loss of the Aa genotype in the wgaDNA samples (Table 1), consistent with allelic drop out during the WGA process. Croft and colleagues reported loss of the heterozygote genotype in 60 to 91% of discordant gDNA and wgaDNA sample pairs [12]. If the wgaDNA samples of control subjects were affected in a similar way to, and to a similar extent as, those of the casesas is probable since DNA from case children was taken at remissionthe most likely effect of the presumed allelic dropout would be to produce a spuriously lower proportion of Aa calls among controls than among cases for particular SNPs. The effect of this would be to artificially inflate the OR associated with the Aa genotype relative to the AA.
Our method for correcting logistic regression results for genotyping error can be applied to any study where misclassification is known to be present in genotypes (or other types two or three level categorical data), and where validation data are available, or where informed estimates of probable misclassification can be made. The method can also be adapted to allow associations between gene-environment interactions and risk of disease to be investigated.
Our correction technique involved using the empirically observed patterns of discordance in matched gDNA and wgaDNA samples from cases to adjust the estimated ORs for the associations between each of these SNPs and the risk of ALL. As expected, the primary effect of the correction on real data was to reduce the ORs associated with the Aa genotype. It also tended, less predictably, to increase the ORs for aa genotypes. There are two likely explanations for this less predictable effect. First, the aa genotype is generally the rarest so that even small adjustments in frequency can produce relatively large proportional changes in the OR. Second, for most SNPs, the correction method involved simultaneous adjustment to the frequency of the AA genotype relative to the Aa genotype, so the reference category for the aa OR was also adjusted.
Applying the discordance seen in our quality control sample pairs to two artificial SNPs showed that the