Rögnvaldsson Thorsteinn, You Liwen, Garwicz Daniel
CAISR, School of Information Science, Computer and Electrical Engineering, Halmstad University, Halmstad, Sweden and Division of Clinical Chemistry and Pharmacology, Department of Medical Sciences, Uppsala University, Uppsala, Sweden.
Bioinformatics. 2015 Apr 15;31(8):1204-10. doi: 10.1093/bioinformatics/btu810. Epub 2014 Dec 9.
Understanding the substrate specificity of human immunodeficiency virus (HIV)-1 protease is important when designing effective HIV-1 protease inhibitors. Furthermore, characterizing and predicting the cleavage profile of HIV-1 protease is essential to generate and test hypotheses of how HIV-1 affects proteins of the human host. Currently available tools for predicting cleavage by HIV-1 protease can be improved.
The linear support vector machine with orthogonal encoding is shown to be the best predictor for HIV-1 protease cleavage. It is considerably better than current publicly available predictor services. It is also found that schemes using physicochemical properties do not improve over the standard orthogonal encoding scheme. Some issues with the currently available data are discussed.
The datasets used, which are the most important part, are available at the UCI Machine Learning Repository. The tools used are all standard and easily available.
在设计有效的人类免疫缺陷病毒1型(HIV-1)蛋白酶抑制剂时,了解HIV-1蛋白酶的底物特异性非常重要。此外,表征和预测HIV-1蛋白酶的切割图谱对于生成和检验HIV-1如何影响人类宿主蛋白质的假设至关重要。目前可用的预测HIV-1蛋白酶切割的工具仍有待改进。
具有正交编码的线性支持向量机被证明是预测HIV-1蛋白酶切割的最佳方法。它比当前公开可用的预测服务要好得多。还发现使用物理化学性质的方案并不比标准正交编码方案有所改进。讨论了当前可用数据存在的一些问题。
所使用的数据集(这是最重要的部分)可从加州大学欧文分校机器学习库获得。所使用的工具都是标准且易于获取的。