文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

利用选择的分类器和混合描述符从八肽序列信息预测 HIV-1 蛋白酶切割位点。

Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors.

机构信息

Department of Pharmaceutical and Medicinal Chemistry, University of Nigeria, Nsukka, Nigeria.

Department of Pharmaceutical Microbiology and Biotechnology, University of Nigeria, Nsukka, Nigeria.

出版信息

BMC Bioinformatics. 2022 Nov 8;23(1):466. doi: 10.1186/s12859-022-05017-x.


DOI:10.1186/s12859-022-05017-x
PMID:36344934
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9641908/
Abstract

BACKGROUND: In most parts of the world, especially in underdeveloped countries, acquired immunodeficiency syndrome (AIDS) still remains a major cause of death, disability, and unfavorable economic outcomes. This has necessitated intensive research to develop effective therapeutic agents for the treatment of human immunodeficiency virus (HIV) infection, which is responsible for AIDS. Peptide cleavage by HIV-1 protease is an essential step in the replication of HIV-1. Thus, correct and timely prediction of the cleavage site of HIV-1 protease can significantly speed up and optimize the drug discovery process of novel HIV-1 protease inhibitors. In this work, we built and compared the performance of selected machine learning models for the prediction of HIV-1 protease cleavage site utilizing a hybrid of octapeptide sequence information comprising bond composition, amino acid binary profile (AABP), and physicochemical properties as numerical descriptors serving as input variables for some selected machine learning algorithms. Our work differs from antecedent studies exploring the same subject in the combination of octapeptide descriptors and method used. Instead of using various subsets of the dataset for training and testing the models, we combined the dataset, applied a 3-way data split, and then used a "stratified" 10-fold cross-validation technique alongside the testing set to evaluate the models. RESULTS: Among the 8 models evaluated in the "stratified" 10-fold CV experiment, logistic regression, multi-layer perceptron classifier, linear discriminant analysis, gradient boosting classifier, Naive Bayes classifier, and decision tree classifier with AUC, F-score, and B. Acc. scores in the ranges of 0.91-0.96, 0.81-0.88, and 80.1-86.4%, respectively, have the closest predictive performance to the state-of-the-art model (AUC 0.96, F-score 0.80 and B. Acc. ~ 80.0%). Whereas, the perceptron classifier and the K-nearest neighbors had statistically lower performance (AUC 0.77-0.82, F-score 0.53-0.69, and B. Acc. 60.0-68.5%) at p < 0.05. On the other hand, logistic regression, and multi-layer perceptron classifier (AUC of 0.97, F-score > 0.89, and B. Acc. > 90.0%) had the best performance on further evaluation on the testing set, though linear discriminant analysis, gradient boosting classifier, and Naive Bayes classifier equally performed well (AUC > 0.94, F-score > 0.87, and B. Acc. > 86.0%). CONCLUSIONS: Logistic regression and multi-layer perceptron classifiers have comparable predictive performances to the state-of-the-art model when octapeptide sequence descriptors consisting of AABP, bond composition and standard physicochemical properties are used as input variables. In our future work, we hope to develop a standalone software for HIV-1 protease cleavage site prediction utilizing the linear regression algorithm and the aforementioned octapeptide sequence descriptors.

摘要

背景:在世界上的大多数地方,特别是在欠发达国家,艾滋病仍然是主要的死亡、残疾和不利经济结果的原因。这就需要进行深入的研究,以开发有效的治疗药物,治疗导致艾滋病的人类免疫缺陷病毒(HIV)感染。HIV-1 蛋白酶对肽的切割是 HIV-1 复制的一个重要步骤。因此,正确和及时地预测 HIV-1 蛋白酶的切割位点,可以显著加快和优化新型 HIV-1 蛋白酶抑制剂的药物发现过程。在这项工作中,我们构建并比较了利用包含键组成、氨基酸二进位模式(AABP)和物理化学性质的八肽序列信息的混合体作为输入变量的几种选定机器学习模型对 HIV-1 蛋白酶切割位点的预测性能。我们的工作与探索同一主题的先前研究不同之处在于八肽描述符的组合和使用的方法。我们没有使用数据集的各种子集进行训练和测试模型,而是组合了数据集,应用了三向数据分割,然后使用"分层"的 10 倍交叉验证技术和测试集来评估模型。

结果:在"分层"的 10 倍 CV 实验中评估的 8 个模型中,逻辑回归、多层感知机分类器、线性判别分析、梯度提升分类器、朴素贝叶斯分类器和决策树分类器的 AUC、F-分数和 B. Acc. 分数分别在 0.91-0.96、0.81-0.88 和 80.1-86.4%的范围内,与最先进的模型(AUC 0.96、F-分数 0.80 和 B. Acc. ~ 80.0%)具有最接近的预测性能。而感知器分类器和 K-最近邻分类器在统计学上表现较低(AUC 0.77-0.82、F-分数 0.53-0.69 和 B. Acc. 60.0-68.5%),p<0.05。另一方面,逻辑回归和多层感知机分类器(AUC 为 0.97、F-分数>0.89 和 B. Acc.>90.0%)在进一步对测试集进行评估时表现最好,尽管线性判别分析、梯度提升分类器和朴素贝叶斯分类器的性能同样出色(AUC>0.94、F-分数>0.87 和 B. Acc.>86.0%)。

结论:当使用包含 AABP、键组成和标准物理化学性质的八肽序列描述符作为输入变量时,逻辑回归和多层感知机分类器的预测性能与最先进的模型相当。在我们未来的工作中,我们希望利用线性回归算法和上述八肽序列描述符开发一个用于 HIV-1 蛋白酶切割位点预测的独立软件。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/9641908/e5419311d2c2/12859_2022_5017_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/9641908/571daf5b201b/12859_2022_5017_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/9641908/093e5b9f1158/12859_2022_5017_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/9641908/b16df6cd5007/12859_2022_5017_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/9641908/6a01f9e53241/12859_2022_5017_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/9641908/e5419311d2c2/12859_2022_5017_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/9641908/571daf5b201b/12859_2022_5017_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/9641908/093e5b9f1158/12859_2022_5017_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/9641908/b16df6cd5007/12859_2022_5017_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/9641908/6a01f9e53241/12859_2022_5017_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c82/9641908/e5419311d2c2/12859_2022_5017_Fig5_HTML.jpg

相似文献

[1]
Prediction of HIV-1 protease cleavage site from octapeptide sequence information using selected classifiers and hybrid descriptors.

BMC Bioinformatics. 2022-11-8

[2]
Prediction of HIV-1 protease cleavage site using a combination of sequence, structural, and physicochemical features.

BMC Bioinformatics. 2016-12-23

[3]
Breast cancer prediction with transcriptome profiling using feature selection and machine learning methods.

BMC Bioinformatics. 2022-10-1

[4]
A Computational Approach for the Prediction of Treatment History and the Effectiveness or Failure of Antiretroviral Therapy.

Int J Mol Sci. 2020-1-23

[5]
Machine Learning Hybrid Model for the Prediction of Chronic Kidney Disease.

Comput Intell Neurosci. 2023

[6]
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.

Med Phys. 2018-6-13

[7]
Computational analysis of HIV-1 protease protein binding pockets.

J Chem Inf Model. 2010-10-25

[8]
A further step toward an optimal ensemble of classifiers for peptide classification, a case study: HIV protease.

Protein Pept Lett. 2009

[9]
HIV-1 protease cleavage site prediction based on amino acid property.

J Comput Chem. 2009-1-15

[10]
QSAR studies on HIV-1 protease inhibitors using non-linearly transformed descriptors.

Curr Comput Aided Drug Des. 2012-3

引用本文的文献

[1]
Scalable and robust machine learning framework for HIV classification using clinical and laboratory data.

Sci Rep. 2025-5-28

[2]
Optimizing unsupervised feature engineering and classification pipelines for differentiated thyroid cancer recurrence prediction.

BMC Med Inform Decis Mak. 2025-5-13

[3]
Meta-2OM: A multi-classifier meta-model for the accurate prediction of RNA 2'-O-methylation sites in human RNA.

PLoS One. 2024

[4]
Comprehending the Structure, Dynamics, and Mechanism of Action of Drug-Resistant HIV Protease.

ACS Omega. 2023-3-7

本文引用的文献

[1]
Support Vector Machine Classifiers Show High Generalizability in Automatic Fall Detection in Older Adults.

Sensors (Basel). 2021-10-28

[2]
A computational multi-targeting approach for drug repositioning for psoriasis treatment.

BMC Complement Med Ther. 2021-7-5

[3]
Predicting HIV-1 Protease Cleavage Sites With Positive-Unlabeled Learning.

Front Genet. 2021-3-26

[4]
Procleave: Predicting Protease-specific Substrate Cleavage Sites by Combining Sequence and Structural Information.

Genomics Proteomics Bioinformatics. 2020-5-12

[5]
Identification of Cancerlectins Using Support Vector Machines With Fusion of G-Gap Dipeptide.

Front Genet. 2020-4-3

[6]
Compositional framework for multitask learning in the identification of cleavage sites of HIV-1 protease.

J Biomed Inform. 2020-2

[7]
Jaccard/Tanimoto similarity test and estimation methods for biological presence-absence data.

BMC Bioinformatics. 2019-12-24

[8]
Logomaker: beautiful sequence logos in Python.

Bioinformatics. 2020-4-1

[9]
DeepCleave: a deep learning predictor for caspase and matrix metalloprotease substrates and cleavage sites.

Bioinformatics. 2020-2-15

[10]
Incorporating the Coevolving Information of Substrates in Predicting HIV-1 Protease Cleavage Sites.

IEEE/ACM Trans Comput Biol Bioinform. 2020

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索