Institute of Biomedical Chemistry, Moscow 119121, Russia.
Molecules. 2018 Oct 24;23(11):2751. doi: 10.3390/molecules23112751.
The high variability of the human immunodeficiency virus (HIV) is an important cause of HIV resistance to reverse transcriptase and protease inhibitors. There are many variants of HIV type 1 (HIV-1) that can be used to model sequence-resistance relationships. Machine learning methods are widely and successfully used in new drug discovery. An emerging body of data regarding the interactions of small drug-like molecules with their protein targets provides the possibility of building models on "structure-property" relationships and analyzing the performance of various machine-learning techniques. In our research, we analyze several different types of descriptors in order to predict the resistance of HIV reverse transcriptase and protease to the marketed antiretroviral drugs using the Random Forest approach. First, we represented amino acid sequences as a set of short peptide fragments, which included several amino acid residues. Second, we represented nucleotide sequences as a set of fragments, which included several nucleotides. We compared these two approaches using open data from the Stanford HIV Drug Resistance Database. We have determined the factors that modulate the performance of prediction: in particular, we observed that the prediction performance was more sensitive to certain drugs than a type of the descriptor used.
人类免疫缺陷病毒(HIV)的高度变异性是 HIV 对逆转录酶和蛋白酶抑制剂产生耐药性的重要原因。有许多 1 型人类免疫缺陷病毒(HIV-1)的变异体可用于模拟序列耐药关系。机器学习方法在新药发现中得到了广泛而成功的应用。关于小分子药物与蛋白靶标相互作用的新兴数据为构建“结构-性质”关系模型和分析各种机器学习技术的性能提供了可能性。在我们的研究中,我们使用随机森林方法分析了几种不同类型的描述符,以预测 HIV 逆转录酶和蛋白酶对市售抗逆转录病毒药物的耐药性。首先,我们将氨基酸序列表示为一组短肽片段,其中包括几个氨基酸残基。其次,我们将核苷酸序列表示为一组片段,其中包括几个核苷酸。我们使用斯坦福 HIV 耐药性数据库中的公开数据比较了这两种方法。我们确定了调节预测性能的因素:特别是,我们观察到预测性能对某些药物比使用的描述符类型更敏感。