• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过掩码语言建模和迁移学习对核糖体合成和翻译后修饰肽生物合成酶的底物预测

Substrate Prediction for RiPP Biosynthetic Enzymes via Masked Language Modeling and Transfer Learning.

作者信息

Clark Joseph D, Mi Xuenan, Mitchell Douglas A, Shukla Diwakar

机构信息

School of Molecular and Cellular Biology,University of Illinois at Urbana-Champaign,Urbana, IL 61801, USA.

Center for Biophysics and Quantitative Biology,University of Illinois at Urbana-Champaign,Urbana, IL 61801, USA.

出版信息

ArXiv. 2024 Feb 23:arXiv:2402.15181v1.

PMID:38463513
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10925380/
Abstract

Ribosomally synthesized and post-translationally modified peptide (RiPP) biosynthetic enzymes often exhibit promiscuous substrate preferences that cannot be reduced to simple rules. Large language models are promising tools for predicting such peptide fitness landscapes. However, state-of-the-art protein language models are trained on relatively few peptide sequences. A previous study comprehensively profiled the peptide substrate preferences of LazBF (a two-component serine dehydratase) and LazDEF (a three-component azole synthetase) from the lactazole biosynthetic pathway. We demonstrated that masked language modeling of LazBF substrate preferences produced language model embeddings that improved downstream classification models of both LazBF and LazDEF substrates. Similarly, masked language modelling of LazDEF substrate preferences produced embeddings that improved the performance of classification models of both LazBF and LazDEF substrates. Our results suggest that the models learned functional forms that are transferable between distinct enzymatic transformations that act within the same biosynthetic pathway. Our transfer learning method improved performance and data efficiency in data-scarce scenarios. We then fine-tuned models on each data set and showed that the fine-tuned models provided interpretable insight that we anticipate will facilitate the design of substrate libraries that are compatible with desired RiPP biosynthetic pathways.

摘要

核糖体合成及翻译后修饰肽(RiPP)生物合成酶通常表现出混杂的底物偏好性,无法简化为简单规则。大语言模型是预测此类肽适应性景观的有前景的工具。然而,当前最先进的蛋白质语言模型是基于相对较少的肽序列进行训练的。先前的一项研究全面剖析了来自乳唑生物合成途径的LazBF(一种双组分丝氨酸脱水酶)和LazDEF(一种三组分唑合成酶)的肽底物偏好性。我们证明,对LazBF底物偏好性进行掩码语言建模产生的语言模型嵌入改善了LazBF和LazDEF底物的下游分类模型。同样,对LazDEF底物偏好性进行掩码语言建模产生的嵌入提高了LazBF和LazDEF底物分类模型的性能。我们的结果表明,这些模型学习到了可在同一生物合成途径中起作用的不同酶促转化之间转移的功能形式。我们的迁移学习方法在数据稀缺的情况下提高了性能和数据效率。然后,我们在每个数据集上对模型进行微调,并表明微调后的模型提供了可解释的见解,我们预计这将有助于设计与所需RiPP生物合成途径兼容的底物库。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac7/10925380/5c13120d6efd/nihpp-2402.15181v1-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac7/10925380/447f0fbb5f28/nihpp-2402.15181v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac7/10925380/667fd46bec73/nihpp-2402.15181v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac7/10925380/1d87ef13c649/nihpp-2402.15181v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac7/10925380/e3509abe3d4d/nihpp-2402.15181v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac7/10925380/0dca238bc37c/nihpp-2402.15181v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac7/10925380/befbeff15cc2/nihpp-2402.15181v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac7/10925380/6d4f176ceff6/nihpp-2402.15181v1-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac7/10925380/5c13120d6efd/nihpp-2402.15181v1-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac7/10925380/447f0fbb5f28/nihpp-2402.15181v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac7/10925380/667fd46bec73/nihpp-2402.15181v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac7/10925380/1d87ef13c649/nihpp-2402.15181v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac7/10925380/e3509abe3d4d/nihpp-2402.15181v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac7/10925380/0dca238bc37c/nihpp-2402.15181v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac7/10925380/befbeff15cc2/nihpp-2402.15181v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac7/10925380/6d4f176ceff6/nihpp-2402.15181v1-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ac7/10925380/5c13120d6efd/nihpp-2402.15181v1-f0008.jpg

相似文献

1
Substrate Prediction for RiPP Biosynthetic Enzymes via Masked Language Modeling and Transfer Learning.通过掩码语言建模和迁移学习对核糖体合成和翻译后修饰肽生物合成酶的底物预测
ArXiv. 2024 Feb 23:arXiv:2402.15181v1.
2
Substrate prediction for RiPP biosynthetic enzymes masked language modeling and transfer learning.RiPP生物合成酶的底物预测:掩码语言建模与迁移学习
Digit Discov. 2024 Dec 2;4(2):343-354. doi: 10.1039/d4dd00170b. eCollection 2025 Feb 12.
3
Accurate Models of Substrate Preferences of Post-Translational Modification Enzymes from a Combination of mRNA Display and Deep Learning.基于mRNA展示技术与深度学习相结合的翻译后修饰酶底物偏好性精确模型
ACS Cent Sci. 2022 Jun 22;8(6):814-824. doi: 10.1021/acscentsci.2c00223. Epub 2022 May 26.
4
Promiscuous Enzymes Cooperate at the Substrate Level En Route to Lactazole A.混杂酶在底物水平上协同作用,生成拉替洛尔 A。
J Am Chem Soc. 2020 Aug 12;142(32):13886-13897. doi: 10.1021/jacs.0c05541. Epub 2020 Jul 31.
5
Accurate Broadcasting of Substrate Fitness for Lactazole Biosynthetic Pathway from Reactivity-Profiling mRNA Display.通过反应性分析mRNA展示精确播报乳唑生物合成途径的底物适应性
J Am Chem Soc. 2020 Nov 19. doi: 10.1021/jacs.0c10374.
6
AlphaFold Accurately Predicts the Structure of Ribosomally Synthesized and Post-Translationally Modified Peptide Biosynthetic Enzymes.AlphaFold 精准预测核糖体合成和翻译后修饰肽生物合成酶的结构。
Biomolecules. 2023 Aug 12;13(8):1243. doi: 10.3390/biom13081243.
7
Three Principles of Diversity-Generating Biosynthesis.多样性生成生物合成的三个原则。
Acc Chem Res. 2017 Oct 17;50(10):2569-2576. doi: 10.1021/acs.accounts.7b00330. Epub 2017 Sep 11.
8
A Silent Biosynthetic Gene Cluster from a Methanotrophic Bacterium Potentiates Discovery of a Substrate Promiscuous Proteusin Cyclodehydratase.一株产甲烷菌沉默生物合成基因簇促进了具有广泛底物特异性的 Proteusin 环脱水酶的发现。
ACS Chem Biol. 2022 Jun 17;17(6):1577-1585. doi: 10.1021/acschembio.2c00251. Epub 2022 Jun 6.
9
Assessing substrate scope of the cyclodehydratase LynD by mRNA display-enabled machine learning models.通过基于mRNA展示的机器学习模型评估环脱水酶LynD的底物范围。
bioRxiv. 2024 Oct 14:2024.10.14.618330. doi: 10.1101/2024.10.14.618330.
10
Deep Learning-Driven Library Design for the Discovery of Bioactive Thiopeptides.用于发现生物活性硫肽的深度学习驱动的文库设计
ACS Cent Sci. 2023 Nov 7;9(11):2150-2160. doi: 10.1021/acscentsci.3c00957. eCollection 2023 Nov 22.

本文引用的文献

1
StructuralDPPIV: a novel deep learning model based on atom structure for predicting dipeptidyl peptidase-IV inhibitory peptides.结构型 DPPIV:一种基于原子结构的新型深度学习模型,用于预测二肽基肽酶-IV 抑制肽。
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae057.
2
Deep Learning-Driven Library Design for the Discovery of Bioactive Thiopeptides.用于发现生物活性硫肽的深度学习驱动的文库设计
ACS Cent Sci. 2023 Nov 7;9(11):2150-2160. doi: 10.1021/acscentsci.3c00957. eCollection 2023 Nov 22.
3
Prediction and design of protease enzyme specificity using a structure-aware graph convolutional network.
利用结构感知图卷积网络预测和设计蛋白酶酶特异性。
Proc Natl Acad Sci U S A. 2023 Sep 26;120(39):e2303590120. doi: 10.1073/pnas.2303590120. Epub 2023 Sep 20.
4
pLM4ACE: A protein language model based predictor for antihypertensive peptide screening.pLM4ACE:一种基于蛋白质语言模型的降压肽筛选预测器。
Food Chem. 2024 Jan 15;431:137162. doi: 10.1016/j.foodchem.2023.137162. Epub 2023 Aug 14.
5
MIND-S is a deep-learning prediction model for elucidating protein post-translational modifications in human diseases.MIND-S 是一种用于阐明人类疾病中蛋白质翻译后修饰的深度学习预测模型。
Cell Rep Methods. 2023 Mar 27;3(3):100430. doi: 10.1016/j.crmeth.2023.100430.
6
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
7
NeuroPred-PLM: an interpretable and robust model for neuropeptide prediction by protein language model.NeuroPred-PLM:一种基于蛋白质语言模型的用于神经肽预测的可解释且稳健的模型。
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad077.
8
Protocol for CAROM: A machine learning tool to predict post-translational regulation from metabolic signatures.CAROM 方案:一种基于代谢特征预测翻译后调控的机器学习工具
STAR Protoc. 2022 Oct 29;3(4):101799. doi: 10.1016/j.xpro.2022.101799. eCollection 2022 Dec 16.
9
BERT-Kgly: A Bidirectional Encoder Representations From Transformers (BERT)-Based Model for Predicting Lysine Glycation Site for .BERT-Kgly:一种基于双向编码器表征变换器(BERT)的赖氨酸糖基化位点预测模型
Front Bioinform. 2022 Feb 18;2:834153. doi: 10.3389/fbinf.2022.834153. eCollection 2022.
10
Mechanism of Action of Ribosomally Synthesized and Post-Translationally Modified Peptides.核糖体合成和翻译后修饰肽的作用机制。
Chem Rev. 2022 Sep 28;122(18):14722-14814. doi: 10.1021/acs.chemrev.2c00210. Epub 2022 Sep 1.