Department of Pharmaceutical and Medicinal Chemistry, University of Nigeria, Nsukka, Nigeria.
Department of Pharmaceutical Microbiology and Biotechnology, University of Nigeria, Nsukka, Nigeria.
BMC Bioinformatics. 2022 Nov 8;23(1):466. doi: 10.1186/s12859-022-05017-x.
BACKGROUND: In most parts of the world, especially in underdeveloped countries, acquired immunodeficiency syndrome (AIDS) still remains a major cause of death, disability, and unfavorable economic outcomes. This has necessitated intensive research to develop effective therapeutic agents for the treatment of human immunodeficiency virus (HIV) infection, which is responsible for AIDS. Peptide cleavage by HIV-1 protease is an essential step in the replication of HIV-1. Thus, correct and timely prediction of the cleavage site of HIV-1 protease can significantly speed up and optimize the drug discovery process of novel HIV-1 protease inhibitors. In this work, we built and compared the performance of selected machine learning models for the prediction of HIV-1 protease cleavage site utilizing a hybrid of octapeptide sequence information comprising bond composition, amino acid binary profile (AABP), and physicochemical properties as numerical descriptors serving as input variables for some selected machine learning algorithms. Our work differs from antecedent studies exploring the same subject in the combination of octapeptide descriptors and method used. Instead of using various subsets of the dataset for training and testing the models, we combined the dataset, applied a 3-way data split, and then used a "stratified" 10-fold cross-validation technique alongside the testing set to evaluate the models. RESULTS: Among the 8 models evaluated in the "stratified" 10-fold CV experiment, logistic regression, multi-layer perceptron classifier, linear discriminant analysis, gradient boosting classifier, Naive Bayes classifier, and decision tree classifier with AUC, F-score, and B. Acc. scores in the ranges of 0.91-0.96, 0.81-0.88, and 80.1-86.4%, respectively, have the closest predictive performance to the state-of-the-art model (AUC 0.96, F-score 0.80 and B. Acc. ~ 80.0%). Whereas, the perceptron classifier and the K-nearest neighbors had statistically lower performance (AUC 0.77-0.82, F-score 0.53-0.69, and B. Acc. 60.0-68.5%) at p < 0.05. On the other hand, logistic regression, and multi-layer perceptron classifier (AUC of 0.97, F-score > 0.89, and B. Acc. > 90.0%) had the best performance on further evaluation on the testing set, though linear discriminant analysis, gradient boosting classifier, and Naive Bayes classifier equally performed well (AUC > 0.94, F-score > 0.87, and B. Acc. > 86.0%). CONCLUSIONS: Logistic regression and multi-layer perceptron classifiers have comparable predictive performances to the state-of-the-art model when octapeptide sequence descriptors consisting of AABP, bond composition and standard physicochemical properties are used as input variables. In our future work, we hope to develop a standalone software for HIV-1 protease cleavage site prediction utilizing the linear regression algorithm and the aforementioned octapeptide sequence descriptors.
背景:在世界上的大多数地方,特别是在欠发达国家,艾滋病仍然是主要的死亡、残疾和不利经济结果的原因。这就需要进行深入的研究,以开发有效的治疗药物,治疗导致艾滋病的人类免疫缺陷病毒(HIV)感染。HIV-1 蛋白酶对肽的切割是 HIV-1 复制的一个重要步骤。因此,正确和及时地预测 HIV-1 蛋白酶的切割位点,可以显著加快和优化新型 HIV-1 蛋白酶抑制剂的药物发现过程。在这项工作中,我们构建并比较了利用包含键组成、氨基酸二进位模式(AABP)和物理化学性质的八肽序列信息的混合体作为输入变量的几种选定机器学习模型对 HIV-1 蛋白酶切割位点的预测性能。我们的工作与探索同一主题的先前研究不同之处在于八肽描述符的组合和使用的方法。我们没有使用数据集的各种子集进行训练和测试模型,而是组合了数据集,应用了三向数据分割,然后使用"分层"的 10 倍交叉验证技术和测试集来评估模型。
结果:在"分层"的 10 倍 CV 实验中评估的 8 个模型中,逻辑回归、多层感知机分类器、线性判别分析、梯度提升分类器、朴素贝叶斯分类器和决策树分类器的 AUC、F-分数和 B. Acc. 分数分别在 0.91-0.96、0.81-0.88 和 80.1-86.4%的范围内,与最先进的模型(AUC 0.96、F-分数 0.80 和 B. Acc. ~ 80.0%)具有最接近的预测性能。而感知器分类器和 K-最近邻分类器在统计学上表现较低(AUC 0.77-0.82、F-分数 0.53-0.69 和 B. Acc. 60.0-68.5%),p<0.05。另一方面,逻辑回归和多层感知机分类器(AUC 为 0.97、F-分数>0.89 和 B. Acc.>90.0%)在进一步对测试集进行评估时表现最好,尽管线性判别分析、梯度提升分类器和朴素贝叶斯分类器的性能同样出色(AUC>0.94、F-分数>0.87 和 B. Acc.>86.0%)。
结论:当使用包含 AABP、键组成和标准物理化学性质的八肽序列描述符作为输入变量时,逻辑回归和多层感知机分类器的预测性能与最先进的模型相当。在我们未来的工作中,我们希望利用线性回归算法和上述八肽序列描述符开发一个用于 HIV-1 蛋白酶切割位点预测的独立软件。
BMC Bioinformatics. 2016-12-23
BMC Bioinformatics. 2022-10-1
Comput Intell Neurosci. 2023
J Chem Inf Model. 2010-10-25
J Comput Chem. 2009-1-15
Curr Comput Aided Drug Des. 2012-3
BMC Med Inform Decis Mak. 2025-5-13
Sensors (Basel). 2021-10-28
BMC Complement Med Ther. 2021-7-5
Front Genet. 2021-3-26
Genomics Proteomics Bioinformatics. 2020-5-12
BMC Bioinformatics. 2019-12-24
Bioinformatics. 2020-4-1
IEEE/ACM Trans Comput Biol Bioinform. 2020