基于层次化和全局特征组合的蛋白质序列 EC 数预测。

EC number prediction of protein sequences based on combination of hierarchical and global features.

机构信息

School of technology, Beijing Forestry University, Beijing 100083, China.

Key Lab of State Forestry Administration for Forestry Equipment and Automation, Beijing 100083, China.

出版信息

Yi Chuan. 2024 Aug;46(8):661-669. doi: 10.16288/j.yczz.24-102.

DOI:10.16288/j.yczz.24-102

PMID:39140146

Abstract

The identification of enzyme functions plays a crucial role in understanding the mechanisms of biological activities and advancing the development of life sciences. However, existing enzyme EC number prediction methods did not fully utilize protein sequence information and still had shortcomings in identification accuracy. To address this issue, we proposed an EC number prediction network using hierarchical features and global features (ECPN-HFGF). This method first utilized residual networks to extract generic features from protein sequences, and then employed hierarchical feature extraction modules and global feature extraction modules to further extract hierarchical and global features of protein sequences. Subsequently, the prediction results of both feature types were combined, and a multitask learning framework was utilized to achieve accurate prediction of enzyme EC numbers. Experimental results indicated that the ECPN-HFGF method performed best in the task of predicting EC numbers for protein sequences, achieving macro F1 and micro F1 scores of 95.5% and 99.0%, respectively. The ECPN-HFGF method effectively combined hierarchical and global features of protein sequences, allowing for rapid and accurate EC number prediction. Compared to current commonly used methods, this method offers significantly higher prediction accuracy, providing an efficient approach for the advancement of enzymology research and enzyme engineering applications.

摘要

酶功能的鉴定在理解生物活性的机制和推进生命科学的发展方面起着至关重要的作用。然而，现有的酶 EC 编号预测方法并没有充分利用蛋白质序列信息，在识别准确性方面仍存在不足。为了解决这个问题，我们提出了一种使用层次特征和全局特征的 EC 编号预测网络（ECPN-HFGF）。该方法首先利用残差网络从蛋白质序列中提取通用特征，然后利用层次特征提取模块和全局特征提取模块进一步提取蛋白质序列的层次和全局特征。随后，将两种特征类型的预测结果进行组合，并利用多任务学习框架实现酶 EC 编号的准确预测。实验结果表明，ECPN-HFGF 方法在蛋白质序列 EC 编号预测任务中表现最佳，宏 F1 和微 F1 得分分别达到 95.5%和 99.0%。ECPN-HFGF 方法有效地结合了蛋白质序列的层次和全局特征，能够实现快速准确的 EC 编号预测。与当前常用的方法相比，该方法具有更高的预测准确性，为酶学研究和酶工程应用的发展提供了一种高效的方法。

相似文献

EC number prediction of protein sequences based on combination of hierarchical and global features.基于层次化和全局特征组合的蛋白质序列 EC 数预测。

Yi Chuan. 2024 Aug;46(8):661-669. doi: 10.16288/j.yczz.24-102.

ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature.ECPred：一种基于 EC 命名法预测蛋白质序列酶功能的工具。

BMC Bioinformatics. 2018 Sep 21;19(1):334. doi: 10.1186/s12859-018-2368-y.

Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers.深度学习可实现酶委员会编号的高质量和高通量预测。

Proc Natl Acad Sci U S A. 2019 Jul 9;116(28):13996-14001. doi: 10.1073/pnas.1821905116. Epub 2019 Jun 20.

Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context.基于联合三联体特征和层次上下文的支持向量机酶功能预测

BMC Syst Biol. 2011 Jun 20;5 Suppl 1(Suppl 1):S6. doi: 10.1186/1752-0509-5-S1-S6.

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.基于超深度学习模型的蛋白质接触图从头精确预测

PLoS Comput Biol. 2017 Jan 5;13(1):e1005324. doi: 10.1371/journal.pcbi.1005324. eCollection 2017 Jan.

MFTrans: A multi-feature transformer network for protein secondary structure prediction.MFTrans：一种用于蛋白质二级结构预测的多特征变换网络。

Int J Biol Macromol. 2024 May;267(Pt 1):131311. doi: 10.1016/j.ijbiomac.2024.131311. Epub 2024 Apr 9.

Enzyme Commission Number Prediction and Benchmarking with Hierarchical Dual-core Multitask Learning Framework.基于分层双核多任务学习框架的酶委员会编号预测与基准测试

Research (Wash D C). 2023 May 31;6:0153. doi: 10.34133/research.0153. eCollection 2023.

Boosting phosphorylation site prediction with sequence feature-based machine learning.基于序列特征的机器学习提高磷酸化位点预测。

Proteins. 2020 Feb;88(2):284-291. doi: 10.1002/prot.25801. Epub 2019 Aug 22.

Relationship between global structural parameters and Enzyme Commission hierarchy: implications for function prediction.全局结构参数与酶委员会层级的关系：对功能预测的启示。

Comput Biol Chem. 2012 Oct;40:15-9. doi: 10.1016/j.compbiolchem.2012.06.003. Epub 2012 Aug 14.

econvRBP: Improved ensemble convolutional neural networks for RNA binding protein prediction directly from sequence.econvRBP：一种改进的集成卷积神经网络，可直接从序列预测 RNA 结合蛋白。

Methods. 2020 Oct 1;181-182:15-23. doi: 10.1016/j.ymeth.2019.09.008. Epub 2019 Sep 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于层次化和全局特征组合的蛋白质序列 EC 数预测。

EC number prediction of protein sequences based on combination of hierarchical and global features.

机构信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献