Study population characteristics have been reported before [10]. We used a subset of 87 patients with a complete baseline genotype and plasma HIV-1 RNA available at baseline and at week 12. Virological failure was observed in 46 (53%) patients at week 12. Mutations at codon 63 had the highest prevalence in this population 80% followed by mutations at codons 10 (58%), 71 (51%), 46 (47%), 54 (47%), 37 (47%), 35 (41%), 82 (40%) and 90 (40%). Mutations at codons 11, 12, 13, 14, 15, 19, 20, 32, 33, 34, 36, 41, 43, 47, 55, 57, 60, 61, 62, 64, 69, 72, 73, 77, 84, 89 and 93 had prevalences between 10% and 40%. Mutations at codons 10, 46, 54, 82 and 90 showed the highest association with virological failure in univariable analysis (p < 10-5). All patients with virological failure presented a mutation at codon 84.
Genotypic score
Among mutations occurring in more than 10% and less than 90% of the patients, 27, 18 and 11 mutations were selected according to p-value thresholds of < 0.25, < 0.05 and < 0.01, respectively. The backward selection procedure using the Cochrane Armitage trend test was started with the 11 mutations (10, 33, 36, 46, 54, 62, 71, 73, 82, 84, 90) selected with the most restrictive criteria (p < 0.01) to avoid computational issues. The stability of this selection step was checked on 200 bootstrap samples. Seven (10:100%, 46: 100%, 54: 100%, 71: 95.5%, 82: 97%, 84: 100%, 90: 96%) of the 11 mutations were selected in over 90% of the samples. The other four mutations were selected between 50% and 90% (33: 88%, 36: 68%, 62: 50%, 73: 68.5%). Mutations not included in the IAS list [14] were in general not selected in the bootstrap samples (exceptions: 19: 36.5%, 37: 19% and 41: 19%). This additional bootstrap analysis confirmed that mutations known to be associated with virological failure were chosen for further steps. Mutations (also known as polymorphisms) that also occur occasionally in untreated patients, thus generally without any relation to antiretroviral treatment, were chosen in less than 3% of the bootstrap samples.
During the backward selection procedure the following six mutations 10, 36, 46, 62, 84, and 90 were selected for the calculation of a genotypic score. The genotypic score calculated with these six mutations was significantly associated with virological failure (OR = 4.1 for a difference of one mutation, CI95% [2.4; 7.0]; p < 10-4; cross-validated OR = 4.9).
Principal component analysis
The first and second principal components explained 11% and 6% of mutations variability. Principal components accounted for a small variability overall. Therefore, their interpretation was difficult. The correlation of the mutations amongst them and to the principal components allowed identifying some clusters as for example mutations 10, 46 and 90 or mutations 32 and 47 already known to be associated together (figure 1). Figure 2 represents the relative weight of each mutation in the dataset to calculate the first principal component. The relative weight of each mutation to calculate the PCA 'score' ranged between 0% (e.g. mutation at codon 22) and 4.3% (e.g. mutations at codons 10 and 54). The sum of the relative weights of mutations represented in the IAS list was 70%, meaning that mutations of the IAS list contributed the most to calculate the first principal component. The mutations at the following six positions 10, 33, 46, 54, 82 and 90 contributed mostly to the first component (figure 2). Among others, mutations at positions 77, 88 and 30 contributed with a negative scoring coefficient to the first component, meaning that the presence of such mutation would decrease the value of the score. Medians of the first and the second principal component were -0.10 (IQR: -0.5–0.84) and 0 (IQR: -0.53–0.40), respectively. The first principal component was significantly associated with virological failure with an OR of 11.9 (CI95% [4.8; 29.7], p < 10-4) for a difference of one unit whereas the second was not OR = 1.1 (CI95% 0.7; 1.7, p = 0.62).
Partial least Square
One PLS component was chosen according to the PRESS criterion. This component explained 11% of the variability of the mutations and 60% of the variability of the response variable. The median of the first PLS component was -0.17 (IQR: -2.69–2.64). This PLS component was significantly associated with virological failure OR = 2.6 (CI95%1.8; 3.9 p < 10-4). Figure 3 represents the relative weight of each mutation in the dataset to calculate the first PLS component. Mutations at positions 10, 46, 54, 82, 84, and 90 had the highest contribution to the calculation of the first component (figure 3). Negative weight for the calculation of the first PLS component was amongst others given by mutations 77, 30 and 48. Mutation at codon 69 contributed with the smallest relative weight (0.03%) and mutation at codon 10 with the highest (4.7%). The contribution of mutations included into the IAS list was 69% (i.e. the sum of relative weights). Thus, mutations already known to be associated with virological failure were given more weight than polymorphisms (mutations that also occur occasionally generally without association to antiretroviral treatment).
Comparison
We compared the results of the PCA and PLS with the results obtained using the classical strategy to build a genotypic score. Mutations 10, 46 and 90 were found among the six mutations contributing with the highest weight for the calculation of the first PC, the first PLS component and were selected for the genotypic score. Major mutations 54 and 82, which were found among the mutations with the highest association to virological failure in univariable analysis, were also found among the six mutations contributing with the highest weight for the calculation of the first PC and the first PLS component. In contrast, these two mutations were eliminated from the score during the backward selection procedure (figure 4). Therefore, one first advantage of methods based on PCA and PLS is that they helped in reducing the number of predictors without neglecting mutations that could play a significant role.
We compared the performance of these three methods with the area under the ROC curve. The cross-validated AUCs for the PCA, PLS and genotypic score were 0.880, 0.868 and 0.863, respectively. The model with the first principal component slightly outperformed the model with one PLS component. The predictive quality of the genotypic score was slightly lower than the two AUCs obtained for PCA and PLS but still showed a very good performance.
To compare the methods in an illustrative way we used a patient presenting the following 21 protease gene mutations at baseline: mutations at positions 33, 54, 82, 90 defined as major, mutations at positions 10, 13, 20, 35, 36 43, 53, 60, 63, 64, 74 defined as minor and mutations at positions 14, 15, 19, 37, 67, 98 defined as polymorphisms. Virological failure was observed for this patient. The genotypic score was S = I10+I36+I90 = 3 and the probability of virological failure was 77% using this score. The main difference between the genotypic score and the principal component value or the PLS component value is that with the latter methods we can take in consideration the fact that the patient has 21 protease gene mutations and give them different weights. For instance, the relative weights for mutations 10, 36, 90 were 4.4%, 2.2%, 4.1% and 4.7%, 2.4%, 4.4% for the PCA and PLS 'score', respectively (figure 2 and 3). The predicted probability of virological failure was 94% and 96% using the PC "score" and the PLS "score", respectively.