• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

NNKcat:通过整合蛋白质序列和底物结构并增强数据不平衡处理来预测催化常数(Kcat)的深度神经网络。

NNKcat: deep neural network to predict catalytic constants (Kcat) by integrating protein sequence and substrate structure with enhanced data imbalance handling.

作者信息

Zhai Jingchen, Qi Xiguang, Cai Lianjin, Liu Yue, Tang Haocheng, Xie Lei, Wang Junmei

机构信息

Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St, Pittsburgh, PA 15261, United States.

Department of Computer Science, Hunter College, The City University of New York, 695 Park Ave, New York, NY 10065, United States.

出版信息

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf212.

DOI:10.1093/bib/bbaf212
PMID:40370097
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12078937/
Abstract

Catalytic constant (Kcat) is to describe the efficiency of catalyzing reactions. The Kcat value of an enzyme-substrate pair indicates the rate an enzyme converts saturated substrates into product during the catalytic process. However, it is challenging to construct robust prediction models for this important property. Most of the existing models, including the one recently published by Nature Catalysis (Li et al.), are suffering from the overfitting issue. In this study, we proposed a novel protocol to construct Kcat prediction models, introducing an intermedia step to separately develop substrate and protein processors. The substrate processor leverages analyzing Simplified Molecular Input Line Entry System (SMILES) strings using a graph neural network model, attentive FP, while the protein processor abstracts protein sequence information utilizing long short-term memory architecture. This protocol not only mitigates the impact of data imbalance in the original dataset but also provides greater flexibility in customizing the general-purpose Kcat prediction model to enhance the prediction accuracy for specific enzyme classes. Our general-purpose Kcat prediction model demonstrates significantly enhanced stability and slightly better accuracy (R2 value of 0.54 versus 0.50) in comparison with Li et al.'s model using the same dataset. Additionally, our modeling protocol enables personalization of fine-tuning the general-purpose Kcat model for specific enzyme categories through focused learning. Using Cytochrome P450 (CYP450) enzymes as a case study, we achieved the best R2 value of 0.64 for the focused model. The high-quality performance and expandability of the model guarantee its broad applications in enzyme engineering and drug research & development.

摘要

催化常数(Kcat)用于描述催化反应的效率。酶 - 底物对的Kcat值表示酶在催化过程中将饱和底物转化为产物的速率。然而,为这一重要性质构建稳健的预测模型具有挑战性。大多数现有模型,包括《自然·催化》(Li等人)最近发表的模型,都存在过拟合问题。在本研究中,我们提出了一种构建Kcat预测模型的新方案,引入了一个中间步骤来分别开发底物和蛋白质处理器。底物处理器利用图神经网络模型Attentive FP分析简化分子输入线性规范(SMILES)字符串,而蛋白质处理器利用长短期记忆架构提取蛋白质序列信息。该方案不仅减轻了原始数据集中数据不平衡的影响,还在定制通用Kcat预测模型方面提供了更大的灵活性,以提高对特定酶类别的预测准确性。与Li等人使用相同数据集的模型相比,我们的通用Kcat预测模型显示出显著增强的稳定性和略高的准确性(R2值为0.54,而之前为0.50)。此外,我们的建模方案能够通过聚焦学习针对特定酶类别对通用Kcat模型进行个性化微调。以细胞色素P450(CYP450)酶为例,我们的聚焦模型实现了0.64的最佳R2值。该模型的高质量性能和可扩展性保证了其在酶工程和药物研发中的广泛应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/7ba7f5f60e0a/bbaf212f11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/99e09ea24816/bbaf212f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/20369715ee68/bbaf212f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/0815cd511ed6/bbaf212f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/dff56150de54/bbaf212f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/25a4bd53f8db/bbaf212f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/02947bff82a9/bbaf212f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/a3124ee2895b/bbaf212f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/933dc80a4b5a/bbaf212f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/39c8e8f2b189/bbaf212f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/20c379fbe476/bbaf212f10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/7ba7f5f60e0a/bbaf212f11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/99e09ea24816/bbaf212f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/20369715ee68/bbaf212f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/0815cd511ed6/bbaf212f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/dff56150de54/bbaf212f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/25a4bd53f8db/bbaf212f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/02947bff82a9/bbaf212f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/a3124ee2895b/bbaf212f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/933dc80a4b5a/bbaf212f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/39c8e8f2b189/bbaf212f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/20c379fbe476/bbaf212f10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/949e/12078937/7ba7f5f60e0a/bbaf212f11.jpg

相似文献

1
NNKcat: deep neural network to predict catalytic constants (Kcat) by integrating protein sequence and substrate structure with enhanced data imbalance handling.NNKcat:通过整合蛋白质序列和底物结构并增强数据不平衡处理来预测催化常数(Kcat)的深度神经网络。
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf212.
2
DeepEnzyme: a robust deep learning model for improved enzyme turnover number prediction by utilizing features of protein 3D-structures.DeepEnzyme:一种强大的深度学习模型,通过利用蛋白质 3D 结构的特征,提高酶转化数预测的准确性。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae409.
3
A novel interpretability framework for enzyme turnover number prediction boosted by pre-trained enzyme embeddings and adaptive gate network.一种由预训练酶嵌入和自适应门控网络推动的用于酶周转数预测的新型可解释性框架。
Methods. 2025 May;237:45-52. doi: 10.1016/j.ymeth.2025.02.010. Epub 2025 Feb 26.
4
PCP-GC-LM: single-sequence-based protein contact prediction using dual graph convolutional neural network and convolutional neural network.PCP-GC-LM:基于双图卷积神经网络和卷积神经网络的单序列蛋白质接触预测。
BMC Bioinformatics. 2024 Sep 2;25(1):287. doi: 10.1186/s12859-024-05914-3.
5
MPEK: a multitask deep learning framework based on pretrained language models for enzymatic reaction kinetic parameters prediction.MPEK:基于预训练语言模型的酶反应动力学参数预测的多任务深度学习框架。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae387.
6
ILMCNet: A Deep Neural Network Model That Uses PLM to Process Features and Employs CRF to Predict Protein Secondary Structure.ILMCNet:一种利用 PLM 处理特征并采用 CRF 预测蛋白质二级结构的深度神经网络模型。
Genes (Basel). 2024 Oct 21;15(10):1350. doi: 10.3390/genes15101350.
7
Addressing data imbalance problems in ligand-binding site prediction using a variational autoencoder and a convolutional neural network.利用变分自动编码器和卷积神经网络解决配体结合位点预测中的数据不平衡问题。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab277.
8
Drug-Target Interaction Prediction: End-to-End Deep Learning Approach.药物-靶点相互作用预测:端到端深度学习方法。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2364-2374. doi: 10.1109/TCBB.2020.2977335. Epub 2021 Dec 8.
9
deepNEC: a novel alignment-free tool for the identification and classification of nitrogen biochemical network-related enzymes using deep learning.深度 NEC:一种新颖的无对齐工具,用于使用深度学习识别和分类与氮生化网络相关的酶。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac071.
10
Positional embeddings and zero-shot learning using BERT for molecular-property prediction.使用BERT进行位置嵌入和零样本学习以预测分子性质
J Cheminform. 2025 Feb 5;17(1):17. doi: 10.1186/s13321-025-00959-9.

本文引用的文献

1
CatPred: a comprehensive framework for deep learning in vitro enzyme kinetic parameters.CatPred:用于深度学习体外酶动力学参数的综合框架。
Nat Commun. 2025 Feb 28;16(1):2072. doi: 10.1038/s41467-025-57215-9.
2
ENKIE: a package for predicting enzyme kinetic parameter values and their uncertainties.ENKIE:用于预测酶动力学参数值及其不确定性的软件包。
Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae652.
3
Protein language models learn evolutionary statistics of interacting sequence motifs.蛋白质语言模型学习相互作用序列基序的进化统计信息。
Proc Natl Acad Sci U S A. 2024 Nov 5;121(45):e2406285121. doi: 10.1073/pnas.2406285121. Epub 2024 Oct 28.
4
Discovery and Enzyme Kinetic Characterization of Novel CYP2D6 Variants.发现和新型 CYP2D6 变体的酶动力学特征。
Chem Res Toxicol. 2024 Nov 18;37(11):1903-1910. doi: 10.1021/acs.chemrestox.4c00298. Epub 2024 Oct 21.
5
DLKcat cannot predict meaningful values for mutants and unfamiliar enzymes.DLKcat无法预测突变体和不熟悉的酶的有意义的值。
Biol Methods Protoc. 2024 Aug 24;9(1):bpae061. doi: 10.1093/biomethods/bpae061. eCollection 2024.
6
DeepEnzyme: a robust deep learning model for improved enzyme turnover number prediction by utilizing features of protein 3D-structures.DeepEnzyme:一种强大的深度学习模型,通过利用蛋白质 3D 结构的特征,提高酶转化数预测的准确性。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae409.
7
MPEK: a multitask deep learning framework based on pretrained language models for enzymatic reaction kinetic parameters prediction.MPEK:基于预训练语言模型的酶反应动力学参数预测的多任务深度学习框架。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae387.
8
DLTKcat: deep learning-based prediction of temperature-dependent enzyme turnover rates.基于深度学习的温度相关酶周转率预测
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad506.
9
UniKP: a unified framework for the prediction of enzyme kinetic parameters.UniKP:一种用于预测酶动力学参数的统一框架。
Nat Commun. 2023 Dec 11;14(1):8211. doi: 10.1038/s41467-023-44113-1.
10
Turnover number predictions for kinetically uncharacterized enzymes using machine and deep learning.使用机器学习和深度学习预测动力学特征未知的酶的周转率。
Nat Commun. 2023 Jul 12;14(1):4139. doi: 10.1038/s41467-023-39840-4.