通过机器学习实现自动单标签和多标签酶功能预测

Automatic single- and multi-label enzymatic function prediction by machine learning.

作者信息

Amidi Shervine, Amidi Afshine, Vlachakis Dimitrios, Paragios Nikos, Zacharaki Evangelia I

机构信息

Department of Applied Mathematics, Center for Visual Computing, Ecole Centrale de Paris (CentraleSupélec), Châtenay-Malabry, France.

MDAKM Group, Department of Computer Engineering and Informatics, University of Patras, Patras, Greece.

出版信息

PeerJ. 2017 Mar 29;5:e3095. doi: 10.7717/peerj.3095. eCollection 2017.

DOI:10.7717/peerj.3095

PMID:28367366

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5374972/

Abstract

The number of protein structures in the PDB database has been increasing more than 15-fold since 1999. The creation of computational models predicting enzymatic function is of major importance since such models provide the means to better understand the behavior of newly discovered enzymes when catalyzing chemical reactions. Until now, single-label classification has been widely performed for predicting enzymatic function limiting the application to enzymes performing unique reactions and introducing errors when multi-functional enzymes are examined. Indeed, some enzymes may be performing different reactions and can hence be directly associated with multiple enzymatic functions. In the present work, we propose a multi-label enzymatic function classification scheme that combines structural and amino acid sequence information. We investigate two fusion approaches (in the feature level and decision level) and assess the methodology for general enzymatic function prediction indicated by the first digit of the enzyme commission (EC) code (six main classes) on 40,034 enzymes from the PDB database. The proposed single-label and multi-label models predict correctly the actual functional activities in 97.8% and 95.5% (based on Hamming-loss) of the cases, respectively. Also the multi-label model predicts all possible enzymatic reactions in 85.4% of the multi-labeled enzymes when the number of reactions is unknown. Code and datasets are available at https://figshare.com/s/a63e0bafa9b71fc7cbd7.

摘要

自1999年以来，蛋白质数据银行（PDB）数据库中的蛋白质结构数量增加了15倍多。创建预测酶功能的计算模型至关重要，因为此类模型为更好地理解新发现的酶催化化学反应时的行为提供了手段。到目前为止，单标签分类已被广泛用于预测酶功能，这限制了其应用于执行独特反应的酶，并且在检查多功能酶时会引入错误。事实上，一些酶可能执行不同的反应，因此可以直接与多种酶功能相关联。在本研究中，我们提出了一种结合结构和氨基酸序列信息的多标签酶功能分类方案。我们研究了两种融合方法（特征级和决策级），并在来自PDB数据库的40034种酶上评估了由酶委员会（EC）代码的第一位数字（六个主要类别）表示的一般酶功能预测方法。所提出的单标签和多标签模型分别在97.8%和95.5%（基于汉明损失）的情况下正确预测了实际功能活性。此外，当反应数量未知时，多标签模型在85.4%的多标签酶中预测了所有可能的酶促反应。代码和数据集可在https://figshare.com/s/a63e0bafa9b71fc7cbd7获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e795/5374972/c662fb182b1d/peerj-05-3095-g001.jpg

相似文献

Automatic single- and multi-label enzymatic function prediction by machine learning.通过机器学习实现自动单标签和多标签酶功能预测

PeerJ. 2017 Mar 29;5:e3095. doi: 10.7717/peerj.3095. eCollection 2017.

ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature.ECPred：一种基于 EC 命名法预测蛋白质序列酶功能的工具。

BMC Bioinformatics. 2018 Sep 21;19(1):334. doi: 10.1186/s12859-018-2368-y.

Enzyme Promiscuity Prediction Using Hierarchy-Informed Multi-Label Classification.基于层次信息的多标签分类的酶多功能性预测。

Bioinformatics. 2021 Aug 4;37(14):2017–2024. doi: 10.1093/bioinformatics/btab054. Epub 2021 Jan 30.

Prediction of Enzyme Function Based on Three Parallel Deep CNN and Amino Acid Mutation.基于三平行深度卷积神经网络和氨基酸突变预测酶功能。

Int J Mol Sci. 2019 Jun 11;20(11):2845. doi: 10.3390/ijms20112845.

EnzML: multi-label prediction of enzyme classes using InterPro signatures.EnzML：使用 InterPro 特征进行酶类的多标签预测。

BMC Bioinformatics. 2012 Apr 25;13:61. doi: 10.1186/1471-2105-13-61.

RNA-binding protein recognition based on multi-view deep feature and multi-label learning.基于多视图深度特征和多标签学习的 RNA 结合蛋白识别。

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa174.

deepNEC: a novel alignment-free tool for the identification and classification of nitrogen biochemical network-related enzymes using deep learning.深度 NEC：一种新颖的无对齐工具，用于使用深度学习识别和分类与氮生化网络相关的酶。

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac071.

ACP-MLC: A two-level prediction engine for identification of anticancer peptides and multi-label classification of their functional types.ACP-MLC：一种用于识别抗癌肽及其功能类型多标签分类的两级预测引擎。

Comput Biol Med. 2023 May;158:106844. doi: 10.1016/j.compbiomed.2023.106844. Epub 2023 Apr 4.

Exploration and Evaluation of Machine Learning-Based Models for Predicting Enzymatic Reactions.基于机器学习的酶反应预测模型的探索与评估。

J Chem Inf Model. 2020 Mar 23;60(3):1833-1843. doi: 10.1021/acs.jcim.9b00877. Epub 2020 Feb 27.

PredictEFC: a fast and efficient multi-label classifier for predicting enzyme family classes.PredictEFC：一种用于预测酶家族类别的快速高效的多标签分类器。

BMC Bioinformatics. 2024 Jan 30;25(1):50. doi: 10.1186/s12859-024-05665-1.

引用本文的文献

Analyzing Patient Experience on Weibo: Machine Learning Approach to Topic Modeling and Sentiment Analysis.分析微博上的患者体验：基于机器学习的主题建模和情感分析方法。

JMIR Med Inform. 2024 Nov 29;12:e59249. doi: 10.2196/59249.

Computational prediction of disordered binding regions.无序结合区域的计算预测

Comput Struct Biotechnol J. 2023 Feb 10;21:1487-1497. doi: 10.1016/j.csbj.2023.02.018. eCollection 2023.

Epitranscriptomics of cardiovascular diseases (Review).心血管疾病的表观转录组学（综述）。

Int J Mol Med. 2022 Jan;49(1). doi: 10.3892/ijmm.2021.5064. Epub 2021 Nov 18.

New Trends in Bioremediation Technologies Toward Environment-Friendly Society: A Mini-Review.面向环境友好型社会的生物修复技术新趋势：一篇综述短文

Front Bioeng Biotechnol. 2021 Aug 2;9:666858. doi: 10.3389/fbioe.2021.666858. eCollection 2021.

Machine learning for enzyme engineering, selection and design.机器学习在酶工程、选择和设计中的应用。

Protein Eng Des Sel. 2021 Feb 15;34. doi: 10.1093/protein/gzab019.

Review of Machine Learning Methods for the Prediction and Reconstruction of Metabolic Pathways.用于代谢途径预测与重建的机器学习方法综述

Front Mol Biosci. 2021 Jun 17;8:634141. doi: 10.3389/fmolb.2021.634141. eCollection 2021.

Machine learning differentiates enzymatic and non-enzymatic metals in proteins.机器学习区分蛋白质中的酶促金属和非酶促金属。

Nat Commun. 2021 Jun 17;12(1):3712. doi: 10.1038/s41467-021-24070-3.

A hierarchical deep learning based approach for multi-functional enzyme classification.基于深度学习的多层次方法用于多功能酶分类。

Protein Sci. 2021 Sep;30(9):1935-1945. doi: 10.1002/pro.4146. Epub 2021 Jun 28.

PSIONplus Server for Accurate Multi-Label Prediction of Ion Channels and Their Types.PSIONplus 服务器，用于准确预测离子通道及其类型的多标签。

Biomolecules. 2020 Jun 7;10(6):876. doi: 10.3390/biom10060876.

Structural Study of the DNA: Clock/Bmal1 Complex Provides Insights for the Role of Cortisol, hGR, and HPA Axis in Stress Management and Sleep Disorders.DNA：Clock/Bmal1 复合物的结构研究为皮质醇、hGR 和 HPA 轴在应激管理和睡眠障碍中的作用提供了线索。

Adv Exp Med Biol. 2020;1195:59-71. doi: 10.1007/978-3-030-32633-3_10.

本文引用的文献

Classifying Multifunctional Enzymes by Incorporating Three Different Models into Chou's General Pseudo Amino Acid Composition.通过将三种不同模型纳入周氏广义伪氨基酸组成对多功能酶进行分类

J Membr Biol. 2016 Aug;249(4):551-7. doi: 10.1007/s00232-016-9904-3. Epub 2016 Apr 25.

Application of a hierarchical enzyme classification method reveals the role of gut microbiome in human metabolism.一种分层酶分类方法的应用揭示了肠道微生物群在人体新陈代谢中的作用。

BMC Genomics. 2015;16 Suppl 7(Suppl 7):S16. doi: 10.1186/1471-2164-16-S7-S16. Epub 2015 Jun 11.

3D representations of amino acids-applications to protein sequence comparison and classification.氨基酸的 3D 表示——在蛋白质序列比较和分类中的应用。

Comput Struct Biotechnol J. 2014 Sep 6;11(18):47-58. doi: 10.1016/j.csbj.2014.09.001. eCollection 2014 Aug.

ENZPRED-enzymatic protein class predicting by machine learning.ENZPRED-基于机器学习的酶蛋白分类预测。

Curr Top Med Chem. 2013;13(14):1674-80. doi: 10.2174/15680266113139990118.

Accurate prediction of protein enzymatic class by N-to-1 Neural Networks.通过 N 到 1 神经网络准确预测蛋白质酶类。

BMC Bioinformatics. 2013;14 Suppl 1(Suppl 1):S11. doi: 10.1186/1471-2105-14-S1-S11. Epub 2013 Jan 14.

Predicting enzymatic function from global binding site descriptors.从全局结合位点描述符预测酶功能。

Proteins. 2013 Mar;81(3):479-89. doi: 10.1002/prot.24205. Epub 2012 Dec 24.

EnzML: multi-label prediction of enzyme classes using InterPro signatures.EnzML：使用 InterPro 特征进行酶类的多标签预测。

BMC Bioinformatics. 2012 Apr 25;13:61. doi: 10.1186/1471-2105-13-61.

A top-down approach to classify enzyme functional classes and sub-classes using random forest.一种使用随机森林对酶功能类别和子类别进行分类的自上而下的方法。

EURASIP J Bioinform Syst Biol. 2012 Feb 29;2012(1):1. doi: 10.1186/1687-4153-2012-1.

3D entropy and moments prediction of enzyme classes and experimental-theoretic study of peptide fingerprints in Leishmania parasites.利什曼原虫中酶类的3D熵和矩预测以及肽指纹图谱的实验-理论研究

Biochim Biophys Acta. 2009 Dec;1794(12):1784-94. doi: 10.1016/j.bbapap.2009.08.020. Epub 2009 Aug 28.

Prediction of enzyme classes from 3D structure: a general model and examples of experimental-theoretic scoring of peptide mass fingerprints of Leishmania proteins.从三维结构预测酶类别：利什曼原虫蛋白质肽质量指纹图谱的通用模型及实验-理论评分示例

J Proteome Res. 2009 Sep;8(9):4372-82. doi: 10.1021/pr9003163.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过机器学习实现自动单标签和多标签酶功能预测

Automatic single- and multi-label enzymatic function prediction by machine learning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献