通过掩码语言建模和迁移学习对核糖体合成和翻译后修饰肽生物合成酶的底物预测

Substrate Prediction for RiPP Biosynthetic Enzymes via Masked Language Modeling and Transfer Learning.

作者信息

Clark Joseph D, Mi Xuenan, Mitchell Douglas A, Shukla Diwakar

机构信息

School of Molecular and Cellular Biology,University of Illinois at Urbana-Champaign,Urbana, IL 61801, USA.

Center for Biophysics and Quantitative Biology,University of Illinois at Urbana-Champaign,Urbana, IL 61801, USA.

出版信息

ArXiv. 2024 Feb 23:arXiv:2402.15181v1.

PMID:38463513

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10925380/

Abstract

Ribosomally synthesized and post-translationally modified peptide (RiPP) biosynthetic enzymes often exhibit promiscuous substrate preferences that cannot be reduced to simple rules. Large language models are promising tools for predicting such peptide fitness landscapes. However, state-of-the-art protein language models are trained on relatively few peptide sequences. A previous study comprehensively profiled the peptide substrate preferences of LazBF (a two-component serine dehydratase) and LazDEF (a three-component azole synthetase) from the lactazole biosynthetic pathway. We demonstrated that masked language modeling of LazBF substrate preferences produced language model embeddings that improved downstream classification models of both LazBF and LazDEF substrates. Similarly, masked language modelling of LazDEF substrate preferences produced embeddings that improved the performance of classification models of both LazBF and LazDEF substrates. Our results suggest that the models learned functional forms that are transferable between distinct enzymatic transformations that act within the same biosynthetic pathway. Our transfer learning method improved performance and data efficiency in data-scarce scenarios. We then fine-tuned models on each data set and showed that the fine-tuned models provided interpretable insight that we anticipate will facilitate the design of substrate libraries that are compatible with desired RiPP biosynthetic pathways.

摘要

核糖体合成及翻译后修饰肽（RiPP）生物合成酶通常表现出混杂的底物偏好性，无法简化为简单规则。大语言模型是预测此类肽适应性景观的有前景的工具。然而，当前最先进的蛋白质语言模型是基于相对较少的肽序列进行训练的。先前的一项研究全面剖析了来自乳唑生物合成途径的LazBF（一种双组分丝氨酸脱水酶）和LazDEF（一种三组分唑合成酶）的肽底物偏好性。我们证明，对LazBF底物偏好性进行掩码语言建模产生的语言模型嵌入改善了LazBF和LazDEF底物的下游分类模型。同样，对LazDEF底物偏好性进行掩码语言建模产生的嵌入提高了LazBF和LazDEF底物分类模型的性能。我们的结果表明，这些模型学习到了可在同一生物合成途径中起作用的不同酶促转化之间转移的功能形式。我们的迁移学习方法在数据稀缺的情况下提高了性能和数据效率。然后，我们在每个数据集上对模型进行微调，并表明微调后的模型提供了可解释的见解，我们预计这将有助于设计与所需RiPP生物合成途径兼容的底物库。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac7/10925380/447f0fbb5f28/nihpp-2402.15181v1-f0001.jpg

相似文献

Substrate Prediction for RiPP Biosynthetic Enzymes via Masked Language Modeling and Transfer Learning.

ArXiv. 2024 Feb 23:arXiv:2402.15181v1.

Substrate prediction for RiPP biosynthetic enzymes masked language modeling and transfer learning.

Digit Discov. 2024 Dec 2;4(2):343-354. doi: 10.1039/d4dd00170b. eCollection 2025 Feb 12.

Accurate Models of Substrate Preferences of Post-Translational Modification Enzymes from a Combination of mRNA Display and Deep Learning.

ACS Cent Sci. 2022 Jun 22;8(6):814-824. doi: 10.1021/acscentsci.2c00223. Epub 2022 May 26.

Promiscuous Enzymes Cooperate at the Substrate Level En Route to Lactazole A.

J Am Chem Soc. 2020 Aug 12;142(32):13886-13897. doi: 10.1021/jacs.0c05541. Epub 2020 Jul 31.

Accurate Broadcasting of Substrate Fitness for Lactazole Biosynthetic Pathway from Reactivity-Profiling mRNA Display.

J Am Chem Soc. 2020 Nov 19. doi: 10.1021/jacs.0c10374.

AlphaFold Accurately Predicts the Structure of Ribosomally Synthesized and Post-Translationally Modified Peptide Biosynthetic Enzymes.

Biomolecules. 2023 Aug 12;13(8):1243. doi: 10.3390/biom13081243.

Three Principles of Diversity-Generating Biosynthesis.

Acc Chem Res. 2017 Oct 17;50(10):2569-2576. doi: 10.1021/acs.accounts.7b00330. Epub 2017 Sep 11.

A Silent Biosynthetic Gene Cluster from a Methanotrophic Bacterium Potentiates Discovery of a Substrate Promiscuous Proteusin Cyclodehydratase.

ACS Chem Biol. 2022 Jun 17;17(6):1577-1585. doi: 10.1021/acschembio.2c00251. Epub 2022 Jun 6.

Assessing substrate scope of the cyclodehydratase LynD by mRNA display-enabled machine learning models.

bioRxiv. 2024 Oct 14:2024.10.14.618330. doi: 10.1101/2024.10.14.618330.

Deep Learning-Driven Library Design for the Discovery of Bioactive Thiopeptides.

ACS Cent Sci. 2023 Nov 7;9(11):2150-2160. doi: 10.1021/acscentsci.3c00957. eCollection 2023 Nov 22.

本文引用的文献

StructuralDPPIV: a novel deep learning model based on atom structure for predicting dipeptidyl peptidase-IV inhibitory peptides.

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae057.

Deep Learning-Driven Library Design for the Discovery of Bioactive Thiopeptides.

ACS Cent Sci. 2023 Nov 7;9(11):2150-2160. doi: 10.1021/acscentsci.3c00957. eCollection 2023 Nov 22.

Prediction and design of protease enzyme specificity using a structure-aware graph convolutional network.

Proc Natl Acad Sci U S A. 2023 Sep 26;120(39):e2303590120. doi: 10.1073/pnas.2303590120. Epub 2023 Sep 20.

pLM4ACE: A protein language model based predictor for antihypertensive peptide screening.

Food Chem. 2024 Jan 15;431:137162. doi: 10.1016/j.foodchem.2023.137162. Epub 2023 Aug 14.

MIND-S is a deep-learning prediction model for elucidating protein post-translational modifications in human diseases.

Cell Rep Methods. 2023 Mar 27;3(3):100430. doi: 10.1016/j.crmeth.2023.100430.

Evolutionary-scale prediction of atomic-level protein structure with a language model.

Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.

NeuroPred-PLM: an interpretable and robust model for neuropeptide prediction by protein language model.

Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad077.

Protocol for CAROM: A machine learning tool to predict post-translational regulation from metabolic signatures.

STAR Protoc. 2022 Oct 29;3(4):101799. doi: 10.1016/j.xpro.2022.101799. eCollection 2022 Dec 16.

BERT-Kgly: A Bidirectional Encoder Representations From Transformers (BERT)-Based Model for Predicting Lysine Glycation Site for .

Front Bioinform. 2022 Feb 18;2:834153. doi: 10.3389/fbinf.2022.834153. eCollection 2022.

Mechanism of Action of Ribosomally Synthesized and Post-Translationally Modified Peptides.

Chem Rev. 2022 Sep 28;122(18):14722-14814. doi: 10.1021/acs.chemrev.2c00210. Epub 2022 Sep 1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过掩码语言建模和迁移学习对核糖体合成和翻译后修饰肽生物合成酶的底物预测

Substrate Prediction for RiPP Biosynthetic Enzymes via Masked Language Modeling and Transfer Learning.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献