通过 KNIME 工作流程检查源于抗菌肽分类的进化比例模型的不同维度嵌入。

Examining evolutionary scale modeling-derived different-dimensional embeddings in the antimicrobial peptide classification through a KNIME workflow.

机构信息

Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Mexico.

Cátedras CONAHCYT - Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Mexico.

出版信息

Protein Sci. 2024 Apr;33(4):e4928. doi: 10.1002/pro.4928.

DOI:10.1002/pro.4928

PMID:38501511

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10949403/

Abstract

Molecular features play an important role in different bio-chem-informatics tasks, such as the Quantitative Structure-Activity Relationships (QSAR) modeling. Several pre-trained models have been recently created to be used in downstream tasks, either by fine-tuning a specific model or by extracting features to feed traditional classifiers. In this regard, a new family of Evolutionary Scale Modeling models (termed as ESM-2 models) was recently introduced, demonstrating outstanding results in protein structure prediction benchmarks. Herein, we studied the usefulness of the different-dimensional embeddings derived from the ESM-2 models to classify antimicrobial peptides (AMPs). To this end, we built a KNIME workflow to use the same modeling methodology across experiments in order to guarantee fair analyses. As a result, the 640- and 1280-dimensional embeddings derived from the 30- and 33-layer ESM-2 models, respectively, are the most valuable since statistically better performances were achieved by the QSAR models built from them. We also fused features of the different ESM-2 models, and it was concluded that the fusion contributes to getting better QSAR models than using features of a single ESM-2 model. Frequency studies revealed that only a portion of the ESM-2 embeddings is valuable for modeling tasks since between 43% and 66% of the features were never used. Comparisons regarding state-of-the-art deep learning (DL) models confirm that when performing methodologically principled studies in the prediction of AMPs, non-DL based QSAR models yield comparable-to-superior performances to DL-based QSAR models. The developed KNIME workflow is available-freely at https://github.com/cicese-biocom/classification-QSAR-bioKom. This workflow can be valuable to avoid unfair comparisons regarding new computational methods, as well as to propose new non-DL based QSAR models.

摘要

分子特征在不同的生物化学信息学任务中起着重要作用，例如定量构效关系 (QSAR) 建模。最近创建了几个预训练模型，可用于下游任务，无论是通过微调特定模型还是提取特征来为传统分类器提供信息。在这方面，最近引入了一组新的进化尺度建模模型（称为 ESM-2 模型），在蛋白质结构预测基准测试中取得了出色的结果。在这里，我们研究了从 ESM-2 模型得出的不同维度的嵌入在分类抗菌肽 (AMP) 中的有用性。为此，我们构建了一个 KNIME 工作流程，以便在实验中使用相同的建模方法，以保证公平的分析。结果表明，来自 30 层和 33 层 ESM-2 模型的 640 维和 1280 维嵌入是最有价值的，因为从它们构建的 QSAR 模型在统计学上表现更好。我们还融合了不同 ESM-2 模型的特征，结论是融合有助于获得比使用单个 ESM-2 模型特征更好的 QSAR 模型。频率研究表明，对于建模任务，只有一部分 ESM-2 嵌入是有价值的，因为在 43%到 66%的特征从未被使用过。与最先进的深度学习 (DL) 模型的比较证实，在 AMP 预测中进行基于方法论的原则研究时，基于非 DL 的 QSAR 模型的性能可与基于 DL 的 QSAR 模型相媲美。开发的 KNIME 工作流程可在 https://github.com/cicese-biocom/classification-QSAR-bioKom 上免费获得。该工作流程对于避免关于新计算方法的不公平比较以及提出新的基于非 DL 的 QSAR 模型非常有价值。

相似文献

Examining evolutionary scale modeling-derived different-dimensional embeddings in the antimicrobial peptide classification through a KNIME workflow.通过 KNIME 工作流程检查源于抗菌肽分类的进化比例模型的不同维度嵌入。

Protein Sci. 2024 Apr;33(4):e4928. doi: 10.1002/pro.4928.

Predicting Antimicrobial Peptides Using ESMFold-Predicted Structures and ESM-2-Based Amino Acid Features with Graph Deep Learning.利用 ESMFold 预测结构和基于 ESM-2 的氨基酸特征以及图深度学习预测抗菌肽。

J Chem Inf Model. 2024 May 27;64(10):4310-4321. doi: 10.1021/acs.jcim.3c02061. Epub 2024 May 13.

Free and open-source QSAR-ready workflow for automated standardization of chemical structures in support of QSAR modeling.用于化学结构自动标准化以支持定量构效关系建模的免费开源且适用于定量构效关系的工作流程。

J Cheminform. 2024 Feb 20;16(1):19. doi: 10.1186/s13321-024-00814-3.

iNClassSec-ESM: Discovering potential non-classical secreted proteins through a novel protein language model.iNClassSec-ESM：通过一种新型蛋白质语言模型发现潜在的非经典分泌蛋白。

Comput Struct Biotechnol J. 2025 Mar 28;27:1350-1358. doi: 10.1016/j.csbj.2025.03.043. eCollection 2025.

An automated framework for QSAR model building.一种用于定量构效关系（QSAR）模型构建的自动化框架。

J Cheminform. 2018 Jan 16;10(1):1. doi: 10.1186/s13321-017-0256-5.

pLM4ACE: A protein language model based predictor for antihypertensive peptide screening.pLM4ACE：一种基于蛋白质语言模型的降压肽筛选预测器。

Food Chem. 2024 Jan 15;431:137162. doi: 10.1016/j.foodchem.2023.137162. Epub 2023 Aug 14.

pLM4CPPs: Protein Language Model-Based Predictor for Cell Penetrating Peptides.pLM4CPPs：基于蛋白质语言模型的细胞穿透肽预测器。

J Chem Inf Model. 2025 Feb 10;65(3):1128-1139. doi: 10.1021/acs.jcim.4c01338. Epub 2025 Jan 29.

Using molecular embeddings in QSAR modeling: does it make a difference?在定量构效关系建模中使用分子嵌入：有区别吗？

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab365.

Do deep learning models make a difference in the identification of antimicrobial peptides?深度学习模型在抗菌肽的识别中是否有作用？

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac094.

An analysis of protein language model embeddings for fold prediction.蛋白质语言模型嵌入物折叠预测分析。

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac142.

引用本文的文献

AmpHGT: expanding prediction of antimicrobial activity in peptides containing non-canonical amino acids using multi-view constrained heterogeneous graph transformer.AmpHGT：使用多视图约束异构图变换器扩展对含非标准氨基酸肽的抗菌活性预测

BMC Biol. 2025 Jul 1;23(1):184. doi: 10.1186/s12915-025-02253-4.

Optimal Descriptor Subset Search via Chemical Information and Target Activity-Guided Algorithm for Antimicrobial Peptide Prediction.基于化学信息和靶标活性导向算法的最优描述符子集搜索用于抗菌肽预测

J Chem Inf Model. 2025 Jul 14;65(13):6621-6631. doi: 10.1021/acs.jcim.5c00600. Epub 2025 Jun 18.

PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria.PAPreC：一种用于比较细菌抗原性预测方法的流程

ACS Omega. 2025 Feb 3;10(6):5415-5429. doi: 10.1021/acsomega.4c07147. eCollection 2025 Feb 18.

Directed evolution of antimicrobial peptides using multi-objective zeroth-order optimization.利用多目标零阶优化进行抗菌肽的定向进化。

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae715.

AI Methods for Antimicrobial Peptides: Progress and Challenges.抗菌肽的人工智能方法：进展与挑战

Microb Biotechnol. 2025 Jan;18(1):e70072. doi: 10.1111/1751-7915.70072.

PGAT-ABPp: harnessing protein language models and graph attention networks for antibacterial peptide identification with remarkable accuracy.PGAT-ABPp：利用蛋白质语言模型和图注意力网络，以极高的准确性识别抗菌肽。

Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae497.

本文引用的文献

Increasing Antimicrobial Resistance and Potential Human Bacterial Pathogens in an Invasive Land Snail Driven by Urbanization.城市化驱动下入侵性陆地蜗牛中日益增加的抗菌素耐药性及潜在的人类细菌病原体

Environ Sci Technol. 2023 May 9;57(18):7273-7284. doi: 10.1021/acs.est.3c01233. Epub 2023 Apr 25.

Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。

Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.

Handcrafted versus non-handcrafted (self-supervised) features for the classification of antimicrobial peptides: complementary or redundant?手工制作与非手工制作（自我监督）特征在抗菌肽分类中的应用：互补还是冗余？

Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac428.

Do deep learning models make a difference in the identification of antimicrobial peptides?深度学习模型在抗菌肽的识别中是否有作用？

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac094.

Identification of antimicrobial peptides from the human gut microbiome using deep learning.利用深度学习从人类肠道微生物组中识别抗菌肽。

Nat Biotechnol. 2022 Jun;40(6):921-931. doi: 10.1038/s41587-022-01226-0. Epub 2022 Mar 3.

Deep-AVPpred: Artificial Intelligence Driven Discovery of Peptide Drugs for Viral Infections.Deep-AVPpred：用于病毒感染的人工智能驱动的肽类药物发现。

IEEE J Biomed Health Inform. 2022 Oct;26(10):5067-5074. doi: 10.1109/JBHI.2021.3130825. Epub 2022 Oct 4.

StaBle-ABPpred: a stacked ensemble predictor based on biLSTM and attention mechanism for accelerated discovery of antibacterial peptides.StaBle-ABPpred：一种基于 biLSTM 和注意力机制的堆叠集成预测器，用于加速抗菌肽的发现。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab439.

Deep-AFPpred: identifying novel antifungal peptides using pretrained embeddings from seq2vec with 1DCNN-BiLSTM.Deep-AFPpred：使用 seq2vec 预训练的嵌入和 1DCNN-BiLSTM 识别新型抗真菌肽。

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab422.

AniAMPpred: artificial intelligence guided discovery of novel antimicrobial peptides in animal kingdom.AniAMPpred：人工智能引导的动物王国中新型抗菌肽的发现。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab242.

Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。

Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验