SAMP：基于比例分割氨基酸组成的集成学习模型鉴定抗菌肽

SAMP: Identifying antimicrobial peptides by an ensemble learning model based on proportionalized split amino acid composition.

作者信息

Feng Junxi, Sun Mengtao, Liu Cong, Zhang Weiwei, Xu Changmou, Wang Jieqiong, Wang Guangshun, Wan Shibiao

机构信息

Department of Biostatistics, School of Public Health, Harvard University, Boston, MA 02115, United States.

Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, United States.

出版信息

Brief Funct Genomics. 2024 Dec 6;23(6):879-890. doi: 10.1093/bfgp/elae046.

DOI:10.1093/bfgp/elae046

PMID:39573886

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11631067/

Abstract

It is projected that 10 million deaths could be attributed to drug-resistant bacteria infections in 2050. To address this concern, identifying new-generation antibiotics is an effective way. Antimicrobial peptides (AMPs), a class of innate immune effectors, have received significant attention for their capacity to eliminate drug-resistant pathogens, including viruses, bacteria, and fungi. Recent years have witnessed widespread applications of computational methods especially machine learning (ML) and deep learning (DL) for discovering AMPs. However, existing methods only use features including compositional, physiochemical, and structural properties of peptides, which cannot fully capture sequence information from AMPs. Here, we present SAMP, an ensemble random projection (RP) based computational model that leverages a new type of feature called proportionalized split amino acid composition (PSAAC) in addition to conventional sequence-based features for AMP prediction. With this new feature set, SAMP captures the residue patterns like sorting signals at both the N-terminal and the C-terminal, while also retaining the sequence order information from the middle peptide fragments. Benchmarking tests on different balanced and imbalanced datasets demonstrate that SAMP consistently outperforms existing state-of-the-art methods, such as iAMPpred and AMPScanner V2, in terms of accuracy, Matthews correlation coefficient (MCC), G-measure, and F1-score. In addition, by leveraging an ensemble RP architecture, SAMP is scalable to processing large-scale AMP identification with further performance improvement, compared to those models without RP. To facilitate the use of SAMP, we have developed a Python package that is freely available at https://github.com/wan-mlab/SAMP.

摘要

据预测，到2050年，耐药菌感染可能导致1000万人死亡。为了解决这一问题，识别新一代抗生素是一种有效的方法。抗菌肽（AMPs）作为一类天然免疫效应物，因其能够消除包括病毒、细菌和真菌在内的耐药病原体的能力而受到广泛关注。近年来，计算方法尤其是机器学习（ML）和深度学习（DL）在发现抗菌肽方面得到了广泛应用。然而，现有方法仅使用包括肽的组成、理化和结构特性等特征，无法充分捕捉抗菌肽的序列信息。在此，我们提出了SAMP，这是一种基于集成随机投影（RP）的计算模型，除了用于抗菌肽预测的传统基于序列的特征外，还利用了一种称为比例化分割氨基酸组成（PSAAC）的新型特征。有了这个新的特征集，SAMP可以捕捉N端和C端类似分选信号的残基模式，同时还保留中间肽片段的序列顺序信息。在不同的平衡和不平衡数据集上的基准测试表明，在准确性、马修斯相关系数（MCC）、G-度量和F1分数方面，SAMP始终优于现有最先进的方法，如iAMPpred和AMPScanner V2。此外，通过利用集成RP架构，与没有RP的模型相比，SAMP在处理大规模抗菌肽识别时具有可扩展性，并且性能进一步提高。为了便于使用SAMP，我们开发了一个Python包，可在https://github.com/wan-mlab/SAMP上免费获取。

相似文献

SAMP: Identifying antimicrobial peptides by an ensemble learning model based on proportionalized split amino acid composition.SAMP：基于比例分割氨基酸组成的集成学习模型鉴定抗菌肽

Brief Funct Genomics. 2024 Dec 6;23(6):879-890. doi: 10.1093/bfgp/elae046.

SAMP: Identifying Antimicrobial Peptides by an Ensemble Learning Model Based on Proportionalized Split Amino Acid Composition.SAMP：基于比例化拆分氨基酸组成的集成学习模型鉴定抗菌肽

bioRxiv. 2024 Apr 26:2024.04.25.590553. doi: 10.1101/2024.04.25.590553.

sAMP-VGG16: Force-field assisted image-based deep neural network prediction model for short antimicrobial peptides.sAMP-VGG16：基于力场辅助图像的短抗菌肽深度神经网络预测模型。

Proteins. 2025 Jan;93(1):372-383. doi: 10.1002/prot.26681. Epub 2024 Mar 23.

Ensemble Machine Learning and Predicted Properties Promote Antimicrobial Peptide Identification.集成机器学习和预测性质促进抗菌肽的鉴定。

Interdiscip Sci. 2024 Dec;16(4):951-965. doi: 10.1007/s12539-024-00640-z. Epub 2024 Jul 7.

iAMPCN: a deep-learning approach for identifying antimicrobial peptides and their functional activities.iAMPCN：一种用于识别抗菌肽及其功能活性的深度学习方法。

Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad240.

PIP-EL: A New Ensemble Learning Method for Improved Proinflammatory Peptide Predictions.PIP-EL：一种用于改进促炎肽预测的新集成学习方法。

Front Immunol. 2018 Jul 31;9:1783. doi: 10.3389/fimmu.2018.01783. eCollection 2018.

Protein Language Models and Machine Learning Facilitate the Identification of Antimicrobial Peptides.蛋白质语言模型和机器学习有助于识别抗菌肽。

Int J Mol Sci. 2024 Aug 14;25(16):8851. doi: 10.3390/ijms25168851.

ECAmyloid: An amyloid predictor based on ensemble learning and comprehensive sequence-derived features.ECAmyloid：一种基于集成学习和综合序列衍生特征的淀粉样蛋白预测器。

Comput Biol Chem. 2023 Jun;104:107853. doi: 10.1016/j.compbiolchem.2023.107853. Epub 2023 Mar 23.

TP-LMMSG: a peptide prediction graph neural network incorporating flexible amino acid property representation.TP-LMMSG：一种融合了灵活的氨基酸性质表示的肽预测图神经网络。

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae308.

deep-AMPpred: A Deep Learning Method for Identifying Antimicrobial Peptides and Their Functional Activities.深度AMP预测：一种用于识别抗菌肽及其功能活性的深度学习方法。

J Chem Inf Model. 2025 Jan 27;65(2):997-1008. doi: 10.1021/acs.jcim.4c01913. Epub 2025 Jan 10.

引用本文的文献

Accelerating antimicrobial peptide design: Leveraging deep learning for rapid discovery.加速抗菌肽设计：利用深度学习实现快速发现

PLoS One. 2024 Dec 20;19(12):e0315477. doi: 10.1371/journal.pone.0315477. eCollection 2024.

本文引用的文献

E-CLEAP: An ensemble learning model for efficient and accurate identification of antimicrobial peptides.E-CLEAP：一种用于高效准确识别抗菌肽的集成学习模型。

PLoS One. 2024 May 9;19(5):e0300125. doi: 10.1371/journal.pone.0300125. eCollection 2024.

The antimicrobial peptide database is 20 years old: Recent developments and future directions.抗菌肽数据库已有20年历史：近期进展与未来方向。

Protein Sci. 2023 Oct;32(10):e4778. doi: 10.1002/pro.4778.

Identification of potent antimicrobial peptides via a machine-learning pipeline that mines the entire space of peptide sequences.通过挖掘整个肽序列空间的机器学习管道识别有效的抗菌肽。

Nat Biomed Eng. 2023 Jun;7(6):797-810. doi: 10.1038/s41551-022-00991-2. Epub 2023 Jan 12.

Epinecidin-1, a marine antifungal peptide, inhibits Botrytis cinerea and delays gray mold in postharvest peaches.海鞘抗菌肽 1 抑制灰葡萄孢并延缓桃采后灰霉病的发生。

Food Chem. 2023 Mar 1;403:134419. doi: 10.1016/j.foodchem.2022.134419. Epub 2022 Sep 27.

Do deep learning models make a difference in the identification of antimicrobial peptides?深度学习模型在抗菌肽的识别中是否有作用？

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac094.

Machine Learning Prediction of Antimicrobial Peptides.机器学习预测抗菌肽。

Methods Mol Biol. 2022;2405:1-37. doi: 10.1007/978-1-0716-1855-4_1.

Identification of antimicrobial peptides from the human gut microbiome using deep learning.利用深度学习从人类肠道微生物组中识别抗菌肽。

Nat Biotechnol. 2022 Jun;40(6):921-931. doi: 10.1038/s41587-022-01226-0. Epub 2022 Mar 3.

Antibiotic resistance and persistence-Implications for human health and treatment perspectives.抗生素耐药性和持久性-对人类健康的影响和治疗观点。

EMBO Rep. 2020 Dec 3;21(12):e51034. doi: 10.15252/embr.202051034. Epub 2020 Dec 8.

Antimicrobial Peptides: Classification, Design, Application and Research Progress in Multiple Fields.抗菌肽：分类、设计、应用及多领域研究进展

Front Microbiol. 2020 Oct 16;11:582779. doi: 10.3389/fmicb.2020.582779. eCollection 2020.

The Dual Role of Antimicrobial Peptides in Autoimmunity.抗菌肽在自身免疫中的双重作用。

Front Immunol. 2020 Sep 2;11:2077. doi: 10.3389/fimmu.2020.02077. eCollection 2020.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验