DEEPre：基于深度学习的酶 EC 号序列预测。

DEEPre: sequence-based enzyme EC number prediction by deep learning.

机构信息

Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), Computer, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia.

Computer Science Department, Illinois Institute of Technology, Chicago, IL 60616, USA.

出版信息

Bioinformatics. 2018 Mar 1;34(5):760-769. doi: 10.1093/bioinformatics/btx680.

DOI:10.1093/bioinformatics/btx680

PMID:29069344

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6030869/

Abstract

MOTIVATION

Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number.

RESULTS

We propose an end-to-end feature selection and classification model training approach, as well as an automatic and robust feature dimensionality uniformization method, DEEPre, in the field of enzyme function prediction. Instead of extracting manually crafted features from enzyme sequences, our model takes the raw sequence encoding as inputs, extracting convolutional and sequential features from the raw encoding based on the classification result to directly improve the prediction performance. The thorough cross-fold validation experiments conducted on two large-scale datasets show that DEEPre improves the prediction performance over the previous state-of-the-art methods. In addition, our server outperforms five other servers in determining the main class of enzymes on a separate low-homology dataset. Two case studies demonstrate DEEPre's ability to capture the functional difference of enzyme isoforms.

AVAILABILITY AND IMPLEMENTATION

The server could be accessed freely at http://www.cbrc.kaust.edu.sa/DEEPre.

CONTACT

xin.gao@kaust.edu.sa.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

酶功能注释具有广泛的应用，如宏基因组学、工业生物技术和酶缺乏引起的疾病的诊断。然而，实验确定每种酶的功能所需的时间和资源成本过高。因此，计算酶功能预测变得越来越重要。在本文中，我们开发了一种通过预测酶委员会编号来确定酶功能的方法。

结果

我们提出了一种端到端的特征选择和分类模型训练方法，以及一种自动的、稳健的特征维度均匀化方法 DEEPre，用于酶功能预测领域。我们的模型不是从酶序列中提取手工制作的特征，而是采用原始序列编码作为输入，根据分类结果从原始编码中提取卷积和序列特征，直接提高预测性能。在两个大规模数据集上进行的彻底的交叉验证实验表明，DEEPre 提高了预测性能，超过了以前的最先进方法。此外，我们的服务器在确定独立的低同源数据集上的酶主要类别的性能优于其他五个服务器。两个案例研究表明 DEEPre 能够捕捉酶同工型的功能差异。

可用性和实现

该服务器可在 http://www.cbrc.kaust.edu.sa/DEEPre 上免费访问。

联系方式

xin.gao@kaust.edu.sa。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d547/6030869/112b1e672417/btx680f1.jpg

相似文献

DEEPre: sequence-based enzyme EC number prediction by deep learning.DEEPre：基于深度学习的酶 EC 号序列预测。

Bioinformatics. 2018 Mar 1;34(5):760-769. doi: 10.1093/bioinformatics/btx680.

deepNEC: a novel alignment-free tool for the identification and classification of nitrogen biochemical network-related enzymes using deep learning.深度 NEC：一种新颖的无对齐工具，用于使用深度学习识别和分类与氮生化网络相关的酶。

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac071.

GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank.GOLabeler：通过学习排序提高基于序列的大规模蛋白质功能预测。

Bioinformatics. 2018 Jul 15;34(14):2465-2473. doi: 10.1093/bioinformatics/bty130.

Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features.Hum-mPLoc 3.0：通过对基因本体和功能域特征的隐藏相关性进行建模来增强人类蛋白质亚细胞定位预测

Bioinformatics. 2017 Mar 15;33(6):843-853. doi: 10.1093/bioinformatics/btw723.

DNCON2: improved protein contact prediction using two-level deep convolutional neural networks.DNCON2：使用两级深度卷积神经网络改进蛋白质接触预测。

Bioinformatics. 2018 May 1;34(9):1466-1472. doi: 10.1093/bioinformatics/btx781.

DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.DeepGO：使用深度本体感知分类器从序列和相互作用预测蛋白质功能。

Bioinformatics. 2018 Feb 15;34(4):660-668. doi: 10.1093/bioinformatics/btx624.

ifDEEPre: large protein language-based deep learning enables interpretable and fast predictions of enzyme commission numbers.ifDEEPre：基于大型蛋白质语言的深度学习可实现酶委员会编号的可解释和快速预测。

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae225.

ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature.ECPred：一种基于 EC 命名法预测蛋白质序列酶功能的工具。

BMC Bioinformatics. 2018 Sep 21;19(1):334. doi: 10.1186/s12859-018-2368-y.

mlDEEPre: Multi-Functional Enzyme Function Prediction With Hierarchical Multi-Label Deep Learning.mlDEEPre：基于分层多标签深度学习的多功能酶功能预测

Front Genet. 2019 Jan 22;9:714. doi: 10.3389/fgene.2018.00714. eCollection 2018.

CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction.CMsearch：同时探索蛋白质序列空间和结构空间不仅能改善蛋白质同源性检测，还能提升蛋白质结构预测。

Bioinformatics. 2016 Jun 15;32(12):i332-i340. doi: 10.1093/bioinformatics/btw271.

引用本文的文献

Semi-supervised data-integrated feature importance enhances performance and interpretability of biological classification tasks.半监督数据集成特征重要性提升了生物分类任务的性能和可解释性。

Bioinformatics. 2025 Jul 1;41(Supplement_1):i373-i381. doi: 10.1093/bioinformatics/btaf190.

RSLpred2: An Integrated Web Server for the Annotation of Rice Proteome Subcellular Localization Using Deep Learning.RSLpred2：一个使用深度学习对水稻蛋白质组亚细胞定位进行注释的集成网络服务器。

Rice (N Y). 2025 Jul 4;18(1):58. doi: 10.1186/s12284-025-00767-7.

Annotating the microbial dark matter with HiFi-NN.用HiFi-NN注释微生物暗物质。

iScience. 2025 Apr 18;28(6):112480. doi: 10.1016/j.isci.2025.112480. eCollection 2025 Jun 20.

Machine Learning Empowering Microbial Cell Factory: A Comprehensive Review.机器学习助力微生物细胞工厂：综述

Appl Biochem Biotechnol. 2025 May 21. doi: 10.1007/s12010-025-05260-x.

Discovery, design, and engineering of enzymes based on molecular retrobiosynthesis.基于分子逆向生物合成的酶的发现、设计与工程

mLife. 2025 Mar 28;4(2):107-125. doi: 10.1002/mlf2.70009. eCollection 2025 Apr.

TopEC: prediction of Enzyme Commission classes by 3D graph neural networks and localized 3D protein descriptor.TopEC：利用三维图神经网络和局部三维蛋白质描述符预测酶委员会类别

Nat Commun. 2025 Mar 20;16(1):2737. doi: 10.1038/s41467-025-57324-5.

Learning maximally spanning representations improves protein function annotation.学习最大生成表示可改善蛋白质功能注释。

bioRxiv. 2025 Feb 17:2025.02.13.638156. doi: 10.1101/2025.02.13.638156.

Comparative Assessment of Protein Large Language Models for Enzyme Commission Number Prediction.用于酶委员会编号预测的蛋白质大语言模型的比较评估

BMC Bioinformatics. 2025 Feb 27;26(1):68. doi: 10.1186/s12859-025-06081-9.

AtSubP-2.0: An integrated web server for the annotation of Arabidopsis proteome subcellular localization using deep learning.AtSubP-2.0：一个使用深度学习对拟南芥蛋白质组亚细胞定位进行注释的集成网络服务器。

Plant Genome. 2025 Mar;18(1):e20536. doi: 10.1002/tpg2.20536.

SProtFP: a machine learning-based method for functional classification of small ORFs in prokaryotes.SProtFP：一种基于机器学习的原核生物中小开放阅读框功能分类方法。

NAR Genom Bioinform. 2025 Jan 7;7(1):lqae186. doi: 10.1093/nargab/lqae186. eCollection 2025 Mar.

本文引用的文献

Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape.序列 2Vec：一种用于建模转录因子结合亲和力景观的新型嵌入方法。

Bioinformatics. 2017 Nov 15;33(22):3575-3583. doi: 10.1093/bioinformatics/btx480.

COFACTOR: improved protein function prediction by combining structure, sequence and protein-protein interaction information.协同因子：通过结合结构、序列和蛋白质-蛋白质相互作用信息来改进蛋白质功能预测。

Nucleic Acids Res. 2017 Jul 3;45(W1):W291-W299. doi: 10.1093/nar/gkx366.

Serial deletion reveals structural basis and stability for the core enzyme activity of human glutaminase 1 isoforms: relevance to excitotoxic neurodegeneration.序列缺失揭示了人谷氨酰胺酶1亚型核心酶活性的结构基础和稳定性：与兴奋性毒性神经退行性变的相关性。

Transl Neurodegener. 2017 Apr 20;6:10. doi: 10.1186/s40035-017-0080-x. eCollection 2017.

Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction.用于逆合成和反应预测的神经符号机器学习

Chemistry. 2017 May 2;23(25):5966-5971. doi: 10.1002/chem.201605499. Epub 2017 Feb 22.

SVM-Prot 2016: A Web-Server for Machine Learning Prediction of Protein Functional Families from Sequence Irrespective of Similarity.SVM-Prot 2016：一个用于从序列中进行机器学习预测蛋白质功能家族而不考虑相似性的网络服务器。

PLoS One. 2016 Aug 15;11(8):e0155290. doi: 10.1371/journal.pone.0155290. eCollection 2016.

DeepPicker: A deep learning approach for fully automated particle picking in cryo-EM.深度挑选器：一种用于冷冻电镜中全自动粒子挑选的深度学习方法。

J Struct Biol. 2016 Sep;195(3):325-336. doi: 10.1016/j.jsb.2016.07.006. Epub 2016 Jul 14.

Classifying Multifunctional Enzymes by Incorporating Three Different Models into Chou's General Pseudo Amino Acid Composition.通过将三种不同模型纳入周氏广义伪氨基酸组成对多功能酶进行分类

J Membr Biol. 2016 Aug;249(4):551-7. doi: 10.1007/s00232-016-9904-3. Epub 2016 Apr 25.

Semisupervised Gaussian Process for Automated Enzyme Search.用于自动酶搜索的半监督高斯过程

ACS Synth Biol. 2016 Jun 17;5(6):518-28. doi: 10.1021/acssynbio.5b00294. Epub 2016 Mar 30.

Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.基于深度卷积神经场的蛋白质二级结构预测

Sci Rep. 2016 Jan 11;6:18962. doi: 10.1038/srep18962.

The Pfam protein families database: towards a more sustainable future.Pfam蛋白质家族数据库：迈向更可持续的未来。

Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85. doi: 10.1093/nar/gkv1344. Epub 2015 Dec 15.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

DEEPre：基于深度学习的酶 EC 号序列预测。

DEEPre: sequence-based enzyme EC number prediction by deep learning.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献