Suppr超能文献

蛋白质语言模型和机器学习有助于识别抗菌肽。

Protein Language Models and Machine Learning Facilitate the Identification of Antimicrobial Peptides.

机构信息

Departamento de Ingeniería en Computación, Universidad de Magallanes, Punta Arenas 6210005, Chile.

Centre for Biotechnology and Bioengineering, CeBiB, Universidad de Chile, Santiago 8370456, Chile.

出版信息

Int J Mol Sci. 2024 Aug 14;25(16):8851. doi: 10.3390/ijms25168851.

Abstract

Peptides are bioactive molecules whose functional versatility in living organisms has led to successful applications in diverse fields. In recent years, the amount of data describing peptide sequences and function collected in open repositories has substantially increased, allowing the application of more complex computational models to study the relations between the peptide composition and function. This work introduces AMP-Detector, a sequence-based classification model for the detection of peptides' functional biological activity, focusing on accelerating the discovery and de novo design of potential antimicrobial peptides (AMPs). AMP-Detector introduces a novel sequence-based pipeline to train binary classification models, integrating protein language models and machine learning algorithms. This pipeline produced 21 models targeting antimicrobial, antiviral, and antibacterial activity, achieving average precision exceeding 83%. Benchmark analyses revealed that our models outperformed existing methods for AMPs and delivered comparable results for other biological activity types. Utilizing the Peptide Atlas, we applied AMP-Detector to discover over 190,000 potential AMPs and demonstrated that it is an integrative approach with generative learning to aid in de novo design, resulting in over 500 novel AMPs. The combination of our methodology, robust models, and a generative design strategy offers a significant advancement in peptide-based drug discovery and represents a pivotal tool for therapeutic applications.

摘要

肽是生物活性分子,其在生物体内的多功能性使其在不同领域得到了成功的应用。近年来,在开放存储库中收集的描述肽序列和功能的数据量大大增加,这使得更复杂的计算模型能够应用于研究肽组成与功能之间的关系。本工作介绍了 AMP-Detector,这是一种基于序列的分类模型,用于检测肽的功能生物活性,重点是加速潜在抗菌肽(AMPs)的发现和从头设计。AMP-Detector 引入了一种新的基于序列的管道,用于训练二进制分类模型,整合了蛋白质语言模型和机器学习算法。该管道生成了 21 种针对抗菌、抗病毒和抗细菌活性的模型,平均精度超过 83%。基准分析表明,我们的模型在 AMPs 方面优于现有方法,并在其他生物活性类型方面取得了可比的结果。利用 PeptideAtlas,我们应用 AMP-Detector 发现了超过 190,000 种潜在的 AMPs,并证明它是一种具有生成式学习的综合方法,可以辅助从头设计,从而产生了超过 500 种新型 AMPs。我们的方法、稳健的模型和生成式设计策略的结合在基于肽的药物发现方面取得了重大进展,代表了治疗应用的关键工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/41be/11487388/d22078ec741c/ijms-25-08851-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验