Rhee Soo-Yon, Taylor Jonathan, Wadhera Gauhar, Ben-Hur Asa, Brutlag Douglas L, Shafer Robert W
Division of Infectious Diseases, Department of Medicine, Stanford University, Stanford, CA 94305, USA.
Proc Natl Acad Sci U S A. 2006 Nov 14;103(46):17355-60. doi: 10.1073/pnas.0607274103. Epub 2006 Oct 25.
Understanding the genetic basis of HIV-1 drug resistance is essential to developing new antiretroviral drugs and optimizing the use of existing drugs. This understanding, however, is hampered by the large numbers of mutation patterns associated with cross-resistance within each antiretroviral drug class. We used five statistical learning methods (decision trees, neural networks, support vector regression, least-squares regression, and least angle regression) to relate HIV-1 protease and reverse transcriptase mutations to in vitro susceptibility to 16 antiretroviral drugs. Learning methods were trained and tested on a public data set of genotype-phenotype correlations by 5-fold cross-validation. For each learning method, four mutation sets were used as input features: a complete set of all mutations in > or =2 sequences in the data set, the 30 most common data set mutations, an expert panel mutation set, and a set of nonpolymorphic treatment-selected mutations from a public database linking protease and reverse transcriptase sequences to antiretroviral drug exposure. The nonpolymorphic treatment-selected mutations led to the best predictions: 80.1% accuracy at classifying sequences as susceptible, low/intermediate resistant, or highly resistant. Least angle regression predicted susceptibility significantly better than other methods when using the complete set of mutations. The three regression methods provided consistent estimates of the quantitative effect of mutations on drug susceptibility, identifying nearly all previously reported genotype-phenotype associations and providing strong statistical support for many new associations. Mutation regression coefficients showed that, within a drug class, cross-resistance patterns differ for different mutation subsets and that cross-resistance has been underestimated.
了解HIV-1耐药性的遗传基础对于开发新的抗逆转录病毒药物以及优化现有药物的使用至关重要。然而,由于每种抗逆转录病毒药物类别中与交叉耐药相关的大量突变模式,这种了解受到了阻碍。我们使用了五种统计学习方法(决策树、神经网络、支持向量回归、最小二乘回归和最小角回归)来关联HIV-1蛋白酶和逆转录酶突变与对16种抗逆转录病毒药物的体外敏感性。通过5折交叉验证,在一个基因型-表型相关性的公共数据集上对学习方法进行训练和测试。对于每种学习方法,使用四个突变集作为输入特征:数据集中≥2个序列中所有突变的完整集合、30个最常见的数据集中突变、一个专家小组突变集以及一组来自将蛋白酶和逆转录酶序列与抗逆转录病毒药物暴露相关联的公共数据库的非多态性治疗选择突变。非多态性治疗选择突变导致了最佳预测:将序列分类为敏感、低/中度耐药或高度耐药时的准确率为80.1%。当使用完整的突变集时,最小角回归预测敏感性明显优于其他方法。这三种回归方法对突变对药物敏感性的定量影响提供了一致的估计,识别了几乎所有先前报道的基因型-表型关联,并为许多新关联提供了有力的统计支持。突变回归系数表明,在一个药物类别中,不同突变子集的交叉耐药模式不同,并且交叉耐药性被低估了。