• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

RiPP生物合成酶的底物预测:掩码语言建模与迁移学习

Substrate prediction for RiPP biosynthetic enzymes masked language modeling and transfer learning.

作者信息

Clark Joseph D, Mi Xuenan, Mitchell Douglas A, Shukla Diwakar

机构信息

School of Molecular and Cellular Biology, University of Illinois at Urbana-Champaign Urbana IL 61801 USA.

Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign Urbana IL 61801 USA.

出版信息

Digit Discov. 2024 Dec 2;4(2):343-354. doi: 10.1039/d4dd00170b. eCollection 2025 Feb 12.

DOI:10.1039/d4dd00170b
PMID:39649639
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11622008/
Abstract

Ribosomally synthesized and post-translationally modified peptide (RiPP) biosynthetic enzymes often exhibit promiscuous substrate preferences that cannot be reduced to simple rules. Large language models are promising tools for predicting the specificity of RiPP biosynthetic enzymes. However, state-of-the-art protein language models are trained on relatively few peptide sequences. A previous study comprehensively profiled the peptide substrate preferences of LazBF (a two-component serine dehydratase) and LazDEF (a three-component azole synthetase) from the lactazole biosynthetic pathway. We demonstrated that masked language modeling of LazBF substrate preferences produced language model embeddings that improved downstream prediction of both LazBF and LazDEF substrates. Similarly, masked language modeling of LazDEF substrate preferences produced embeddings that improved prediction of both LazBF and LazDEF substrates. Our results suggest that the models learned functional forms that are transferable between distinct enzymatic transformations that act within the same biosynthetic pathway. We found that a single high-quality data set of substrates and non-substrates for a RiPP biosynthetic enzyme improved substrate prediction for distinct enzymes in data-scarce scenarios. We then fine-tuned models on each data set and showed that the fine-tuned models provided interpretable insight that we anticipate will facilitate the design of substrate libraries that are compatible with desired RiPP biosynthetic pathways.

摘要

核糖体合成及翻译后修饰肽(RiPP)生物合成酶通常表现出混杂的底物偏好,无法简化为简单规则。大语言模型是预测RiPP生物合成酶特异性的有前景的工具。然而,当前最先进的蛋白质语言模型是基于相对较少的肽序列进行训练的。先前的一项研究全面分析了来自乳唑生物合成途径的LazBF(一种双组分丝氨酸脱水酶)和LazDEF(一种三组分唑合成酶)的肽底物偏好。我们证明,对LazBF底物偏好进行掩码语言建模产生的语言模型嵌入改善了对LazBF和LazDEF底物的下游预测。同样,对LazDEF底物偏好进行掩码语言建模产生的嵌入也改善了对LazBF和LazDEF底物的预测。我们的结果表明,这些模型学习到了可在同一生物合成途径中起作用的不同酶促转化之间转移的功能形式。我们发现,一个单一的、高质量的RiPP生物合成酶底物和非底物数据集在数据稀缺的情况下改善了对不同酶的底物预测。然后,我们在每个数据集上对模型进行了微调,并表明微调后的模型提供了可解释的见解,我们预计这将有助于设计与所需RiPP生物合成途径兼容的底物库。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/476c/11622008/12c42921ff6a/d4dd00170b-f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/476c/11622008/2d06d2683dd6/d4dd00170b-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/476c/11622008/e7c83acd8c91/d4dd00170b-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/476c/11622008/fa6a0734170a/d4dd00170b-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/476c/11622008/1f13da03f16f/d4dd00170b-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/476c/11622008/d26159e795da/d4dd00170b-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/476c/11622008/197c84c23324/d4dd00170b-f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/476c/11622008/655cc3b7d91f/d4dd00170b-f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/476c/11622008/12c42921ff6a/d4dd00170b-f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/476c/11622008/2d06d2683dd6/d4dd00170b-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/476c/11622008/e7c83acd8c91/d4dd00170b-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/476c/11622008/fa6a0734170a/d4dd00170b-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/476c/11622008/1f13da03f16f/d4dd00170b-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/476c/11622008/d26159e795da/d4dd00170b-f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/476c/11622008/197c84c23324/d4dd00170b-f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/476c/11622008/655cc3b7d91f/d4dd00170b-f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/476c/11622008/12c42921ff6a/d4dd00170b-f8.jpg

相似文献

1
Substrate prediction for RiPP biosynthetic enzymes masked language modeling and transfer learning.RiPP生物合成酶的底物预测:掩码语言建模与迁移学习
Digit Discov. 2024 Dec 2;4(2):343-354. doi: 10.1039/d4dd00170b. eCollection 2025 Feb 12.
2
Substrate Prediction for RiPP Biosynthetic Enzymes via Masked Language Modeling and Transfer Learning.通过掩码语言建模和迁移学习对核糖体合成和翻译后修饰肽生物合成酶的底物预测
ArXiv. 2024 Feb 23:arXiv:2402.15181v1.
3
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.
4
Short-Term Memory Impairment短期记忆障碍
5
Advancing the Accuracy of Anti-MRSA Peptide Prediction Through Integrating Multi-Source Protein Language Models.通过整合多源蛋白质语言模型提高抗耐甲氧西林金黄色葡萄球菌肽预测的准确性
Interdiscip Sci. 2025 Mar 11. doi: 10.1007/s12539-025-00696-5.
6
Exploring ribosomally synthesized and post-translationally modified peptides through SPECO-based genome mining.通过基于SPECO的基因组挖掘探索核糖体合成及翻译后修饰的肽。
Methods Enzymol. 2025;717:67-87. doi: 10.1016/bs.mie.2025.04.005. Epub 2025 May 24.
7
Surgery for epilepsy.癫痫手术
Cochrane Database Syst Rev. 2015 Jul 1(7):CD010541. doi: 10.1002/14651858.CD010541.pub2.
8
Carbon dioxide detection for diagnosis of inadvertent respiratory tract placement of enterogastric tubes in children.用于诊断儿童肠胃管意外置入呼吸道的二氧化碳检测
Cochrane Database Syst Rev. 2025 Feb 19;2(2):CD011196. doi: 10.1002/14651858.CD011196.pub2.
9
Assessing substrate scope of the cyclodehydratase LynD by mRNA display-enabled machine learning models.通过基于mRNA展示的机器学习模型评估环脱水酶LynD的底物范围。
bioRxiv. 2024 Oct 14:2024.10.14.618330. doi: 10.1101/2024.10.14.618330.
10
Sexual Harassment and Prevention Training性骚扰与预防培训

本文引用的文献

1
Deep Learning-Driven Library Design for the Discovery of Bioactive Thiopeptides.用于发现生物活性硫肽的深度学习驱动的文库设计
ACS Cent Sci. 2023 Nov 7;9(11):2150-2160. doi: 10.1021/acscentsci.3c00957. eCollection 2023 Nov 22.
2
Prediction and design of protease enzyme specificity using a structure-aware graph convolutional network.利用结构感知图卷积网络预测和设计蛋白酶酶特异性。
Proc Natl Acad Sci U S A. 2023 Sep 26;120(39):e2303590120. doi: 10.1073/pnas.2303590120. Epub 2023 Sep 20.
3
pLM4ACE: A protein language model based predictor for antihypertensive peptide screening.
pLM4ACE:一种基于蛋白质语言模型的降压肽筛选预测器。
Food Chem. 2024 Jan 15;431:137162. doi: 10.1016/j.foodchem.2023.137162. Epub 2023 Aug 14.
4
MIND-S is a deep-learning prediction model for elucidating protein post-translational modifications in human diseases.MIND-S 是一种用于阐明人类疾病中蛋白质翻译后修饰的深度学习预测模型。
Cell Rep Methods. 2023 Mar 27;3(3):100430. doi: 10.1016/j.crmeth.2023.100430.
5
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
6
NeuroPred-PLM: an interpretable and robust model for neuropeptide prediction by protein language model.NeuroPred-PLM:一种基于蛋白质语言模型的用于神经肽预测的可解释且稳健的模型。
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad077.
7
Protocol for CAROM: A machine learning tool to predict post-translational regulation from metabolic signatures.CAROM 方案:一种基于代谢特征预测翻译后调控的机器学习工具
STAR Protoc. 2022 Oct 29;3(4):101799. doi: 10.1016/j.xpro.2022.101799. eCollection 2022 Dec 16.
8
BERT-Kgly: A Bidirectional Encoder Representations From Transformers (BERT)-Based Model for Predicting Lysine Glycation Site for .BERT-Kgly:一种基于双向编码器表征变换器(BERT)的赖氨酸糖基化位点预测模型
Front Bioinform. 2022 Feb 18;2:834153. doi: 10.3389/fbinf.2022.834153. eCollection 2022.
9
Mechanism of Action of Ribosomally Synthesized and Post-Translationally Modified Peptides.核糖体合成和翻译后修饰肽的作用机制。
Chem Rev. 2022 Sep 28;122(18):14722-14814. doi: 10.1021/acs.chemrev.2c00210. Epub 2022 Sep 1.
10
Mini-review: Recent advances in post-translational modification site prediction based on deep learning.小型综述:基于深度学习的翻译后修饰位点预测的最新进展
Comput Struct Biotechnol J. 2022 Jun 30;20:3522-3532. doi: 10.1016/j.csbj.2022.06.045. eCollection 2022.