• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用可解释的优化集成学习框架预测酶的功能。

Prediction of enzyme function using an interpretable optimized ensemble learning framework.

作者信息

Dhibar Saikat, Basak Sumon, Jana Biman

机构信息

School of Chemical Sciences, Indian Association for the Cultivation of Science Jadavpur Kolkata-700032 India

出版信息

Chem Sci. 2025 Sep 1. doi: 10.1039/d5sc04513d.

DOI:10.1039/d5sc04513d
PMID:40951780
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12424445/
Abstract

Accurate prediction of enzyme function, particularly for newly discovered uncharacterized sequences, is immensely important for modern biological research. Recently, machine learning (ML) based methods have shown promise. However, such tools often suffer from complexity in feature extraction, interpretability, and generalization ability. In this study, we construct a dataset for enzyme functions and present an interpretable ML method, SOLVE (Soft-Voting Optimized Learning for Versatile Enzymes), that addresses these issues by using only combinations of tokenized subsequences from the protein's primary sequence for classification. SOLVE utilizes an ensemble learning framework integrating random forest (RF), light gradient boosting machine (LightGBM) and decision tree (DT) models with an optimized weighted strategy, which enhances prediction accuracy, distinguishes enzymes from non-enzymes, and predicts enzyme commission (EC) numbers for mono- and multi-functional enzymes. The focal loss penalty in SOLVE effectively mitigates class imbalance, refining functional annotation accuracy. Additionally, SOLVE provides interpretability through Shapley analyses, identifying functional motifs at catalytic and allosteric sites of enzymes. By leveraging only primary sequence data, SOLVE streamlines high-throughput enzyme function prediction for functionally uncharacterized sequences and outperforms existing tools across all evaluation metrics on independent datasets. With its high prediction accuracy and ability to identify functional regions, SOLVE can become a promising tool in different fields of biology and therapeutic drug design.

摘要

准确预测酶的功能,特别是对于新发现的未表征序列,对现代生物学研究极为重要。最近,基于机器学习(ML)的方法已显示出前景。然而,此类工具通常在特征提取、可解释性和泛化能力方面存在复杂性。在本研究中,我们构建了一个酶功能数据集,并提出了一种可解释的ML方法SOLVE(多功能酶的软投票优化学习),该方法通过仅使用来自蛋白质一级序列的分词子序列组合进行分类来解决这些问题。SOLVE利用集成学习框架,将随机森林(RF)、轻梯度提升机(LightGBM)和决策树(DT)模型与优化的加权策略相结合,提高了预测准确性,区分了酶与非酶,并预测了单功能和多功能酶的酶委员会(EC)编号。SOLVE中的焦点损失惩罚有效地减轻了类别不平衡,提高了功能注释的准确性。此外,SOLVE通过Shapley分析提供可解释性,识别酶的催化和变构位点的功能基序。通过仅利用一级序列数据,SOLVE简化了对功能未表征序列的高通量酶功能预测,并在独立数据集的所有评估指标上优于现有工具。凭借其高预测准确性和识别功能区域的能力,SOLVE可以成为生物学和治疗药物设计不同领域中有前景的工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1fe/12424445/781333fcf17e/d5sc04513d-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1fe/12424445/e9d64eead00a/d5sc04513d-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1fe/12424445/9a2c176bf6ed/d5sc04513d-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1fe/12424445/1a00bef53f5f/d5sc04513d-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1fe/12424445/781333fcf17e/d5sc04513d-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1fe/12424445/e9d64eead00a/d5sc04513d-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1fe/12424445/9a2c176bf6ed/d5sc04513d-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1fe/12424445/1a00bef53f5f/d5sc04513d-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1fe/12424445/781333fcf17e/d5sc04513d-f4.jpg

相似文献

1
Prediction of enzyme function using an interpretable optimized ensemble learning framework.使用可解释的优化集成学习框架预测酶的功能。
Chem Sci. 2025 Sep 1. doi: 10.1039/d5sc04513d.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
4
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
5
Development of an interpretable machine learning model for frailty risk prediction in older adult care institutions: a mixed-methods, cross-sectional study in China.老年护理机构衰弱风险预测的可解释机器学习模型的开发:中国的一项混合方法横断面研究。
BMJ Open. 2025 Jul 5;15(7):e095460. doi: 10.1136/bmjopen-2024-095460.
6
AI in Medical Questionnaires: Innovations, Diagnosis, and Implications.医学问卷中的人工智能:创新、诊断及影响
J Med Internet Res. 2025 Jun 23;27:e72398. doi: 10.2196/72398.
7
A Responsible Framework for Assessing, Selecting, and Explaining Machine Learning Models in Cardiovascular Disease Outcomes Among People With Type 2 Diabetes: Methodology and Validation Study.用于评估、选择和解释2型糖尿病患者心血管疾病结局机器学习模型的责任框架:方法与验证研究
JMIR Med Inform. 2025 Jun 27;13:e66200. doi: 10.2196/66200.
8
iACP-DPNet: a dual-pooling causal dilated convolutional network for interpretable anticancer peptide identification.iACP-DPNet:一种用于可解释抗癌肽识别的双池因果扩张卷积网络。
Funct Integr Genomics. 2025 Jul 4;25(1):147. doi: 10.1007/s10142-025-01641-x.
9
Establishment and validation of an interactive artificial intelligence platform to predict postoperative ambulatory status for patients with metastatic spinal disease: a multicenter analysis.建立和验证交互式人工智能平台,以预测转移性脊柱疾病患者的术后活动状态:一项多中心分析。
Int J Surg. 2024 May 1;110(5):2738-2756. doi: 10.1097/JS9.0000000000001169.
10
Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.用于预测脓毒症患者脓毒症相关肝损伤的监督式机器学习模型:基于多中心队列研究的开发与验证研究
J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733.

本文引用的文献

1
Correlations among Hydrogen Bond Fluctuations in the Apo State Are Enough to Reveal Allosteric Networks in Proteins.脱辅基状态下氢键波动之间的相关性足以揭示蛋白质中的别构网络。
J Phys Chem B. 2025 Jul 31;129(30):7745-7752. doi: 10.1021/acs.jpcb.5c03281. Epub 2025 Jul 17.
2
Optimized Collective Variable for Collapse Transition in Linear Hydrophobic Polymers: Importance of Hydration Water and End-to-End Distance.
J Chem Theory Comput. 2024 Sep 10;20(17):7404-7415. doi: 10.1021/acs.jctc.4c00753. Epub 2024 Aug 27.
3
Predicting protein conformational motions using energetic frustration analysis and AlphaFold2.使用能量去阻分析和 AlphaFold2 预测蛋白质构象运动。
Proc Natl Acad Sci U S A. 2024 Aug 27;121(35):e2410662121. doi: 10.1073/pnas.2410662121. Epub 2024 Aug 20.
4
Modulation of the conformational landscape of the PDZ3 domain by perturbation on a distal non-canonical α3 helix: decoding the microscopic mechanism of allostery in the PDZ3 domain.通过对远端非典型α3 螺旋的扰动来调节 PDZ3 结构域的构象景观:解析 PDZ3 结构域变构作用的微观机制。
Phys Chem Chem Phys. 2024 Aug 7;26(31):21249-21259. doi: 10.1039/d4cp01806k.
5
Hybrid framework for membrane protein type prediction based on the PSSM.基于 PSSM 的膜蛋白类型预测的混合框架。
Sci Rep. 2024 Jul 26;14(1):17156. doi: 10.1038/s41598-024-68163-7.
6
Functional effects of mutations in proteins can be predicted and interpreted by guided selection of sequence covariation information.通过对序列协变信息的有针对性选择,可以预测和解释蛋白质突变的功能影响。
Proc Natl Acad Sci U S A. 2024 Jun 25;121(26):e2312335121. doi: 10.1073/pnas.2312335121. Epub 2024 Jun 18.
7
Accurate Prediction of Antifreeze Protein from Sequences through Natural Language Text Processing and Interpretable Machine Learning Approaches.通过自然语言文本处理和可解释的机器学习方法准确预测抗冻蛋白序列。
J Phys Chem Lett. 2023 Dec 7;14(48):10727-10735. doi: 10.1021/acs.jpclett.3c02817. Epub 2023 Nov 27.
8
Enzyme function and evolution through the lens of bioinformatics.通过生物信息学的视角研究酶的功能和进化。
Biochem J. 2023 Nov 29;480(22):1845-1863. doi: 10.1042/BCJ20220405.
9
Evidential deep learning for trustworthy prediction of enzyme commission number.基于证据的深度学习方法可实现酶委员会编号的可靠预测。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad401.
10
Functional annotation of enzyme-encoding genes using deep learning with transformer layers.利用带有转换器层的深度学习对酶编码基因进行功能注释。
Nat Commun. 2023 Nov 14;14(1):7370. doi: 10.1038/s41467-023-43216-z.