• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PredictEFC:一种用于预测酶家族类别的快速高效的多标签分类器。

PredictEFC: a fast and efficient multi-label classifier for predicting enzyme family classes.

机构信息

College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, People's Republic of China.

出版信息

BMC Bioinformatics. 2024 Jan 30;25(1):50. doi: 10.1186/s12859-024-05665-1.

DOI:10.1186/s12859-024-05665-1
PMID:38291384
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10829269/
Abstract

BACKGROUND

Enzymes play an irreplaceable and important role in maintaining the lives of living organisms. The Enzyme Commission (EC) number of an enzyme indicates its essential functions. Correct identification of the first digit (family class) of the EC number for a given enzyme is a hot topic in the past twenty years. Several previous methods adopted functional domain composition to represent enzymes. However, it would lead to dimension disaster, thereby reducing the efficiency of the methods. On the other hand, most previous methods can only deal with enzymes belonging to one family class. In fact, several enzymes belong to two or more family classes.

RESULTS

In this study, a fast and efficient multi-label classifier, named PredictEFC, was designed. To construct this classifier, a novel feature extraction scheme was designed for processing functional domain information of enzymes, which counting the distribution of each functional domain entry across seven family classes in the training dataset. Based on this scheme, each training or test enzyme was encoded into a 7-dimenion vector by fusing its functional domain information and above statistical results. Random k-labelsets (RAKEL) was adopted to build the classifier, where random forest was selected as the base classification algorithm. The two tenfold cross-validation results on the training dataset shown that the accuracy of PredictEFC can reach 0.8493 and 0.8370. The independent test on two datasets indicated the accuracy values of 0.9118 and 0.8777.

CONCLUSION

The performance of PredictEFC was slightly lower than the classifier directly using functional domain composition. However, its efficiency was sharply improved. The running time was less than one-tenth of the time of the classifier directly using functional domain composition. In additional, the utility of PredictEFC was superior to the classifiers using traditional dimensionality reduction methods and some previous methods, and this classifier can be transplanted for predicting enzyme family classes of other species. Finally, a web-server available at http://124.221.158.221/ was set up for easy usage.

摘要

背景

酶在维持生物生命活动中发挥着不可替代的重要作用。酶的酶委员会(EC)编号表明了其基本功能。正确识别给定酶的 EC 编号的第一位数字(家族类别)是过去 20 年来的热门话题。以前的几种方法采用功能域组成来表示酶。然而,这会导致维度灾难,从而降低方法的效率。另一方面,以前的大多数方法只能处理属于一个家族类别的酶。事实上,有几种酶属于两个或更多的家族类别。

结果

本研究设计了一种快速高效的多标签分类器 PredictEFC。为了构建这个分类器,我们设计了一种新的特征提取方案,用于处理酶的功能域信息,该方案计算了训练数据集中每个功能域条目在七个家族类别中的分布。基于该方案,通过融合酶的功能域信息和上述统计结果,将每个训练或测试酶编码成一个 7 维向量。采用随机 k-标签集(RAKEL)构建分类器,其中选择随机森林作为基础分类算法。在训练数据集上的两次十折交叉验证结果表明,PredictEFC 的准确率可达 0.8493 和 0.8370。在两个数据集上的独立测试表明,准确率值分别为 0.9118 和 0.8777。

结论

PredictEFC 的性能略低于直接使用功能域组成的分类器,但效率显著提高。运行时间不到直接使用功能域组成的分类器的十分之一。此外,PredictEFC 的实用性优于使用传统降维方法和以前一些方法的分类器,并且这个分类器可以移植到其他物种的酶家族类别预测中。最后,我们在 http://124.221.158.221/ 上建立了一个可供使用的网络服务器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/6b33409a30a4/12859_2024_5665_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/33475d8d688f/12859_2024_5665_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/3d1f21d2f429/12859_2024_5665_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/8a0109a62d27/12859_2024_5665_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/3eb76d2509f0/12859_2024_5665_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/b32c299d10fa/12859_2024_5665_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/52ec7b22f6a9/12859_2024_5665_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/5c884b8f57bd/12859_2024_5665_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/a3483dd32580/12859_2024_5665_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/d9393f2f0107/12859_2024_5665_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/6b33409a30a4/12859_2024_5665_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/33475d8d688f/12859_2024_5665_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/3d1f21d2f429/12859_2024_5665_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/8a0109a62d27/12859_2024_5665_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/3eb76d2509f0/12859_2024_5665_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/b32c299d10fa/12859_2024_5665_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/52ec7b22f6a9/12859_2024_5665_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/5c884b8f57bd/12859_2024_5665_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/a3483dd32580/12859_2024_5665_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/d9393f2f0107/12859_2024_5665_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c119/10829269/6b33409a30a4/12859_2024_5665_Fig10_HTML.jpg

相似文献

1
PredictEFC: a fast and efficient multi-label classifier for predicting enzyme family classes.PredictEFC:一种用于预测酶家族类别的快速高效的多标签分类器。
BMC Bioinformatics. 2024 Jan 30;25(1):50. doi: 10.1186/s12859-024-05665-1.
2
iATC-NRAKEL: an efficient multi-label classifier for recognizing anatomical therapeutic chemical classes of drugs.iATC-NRAKEL:一种用于识别药物解剖治疗化学类别的高效多标签分类器。
Bioinformatics. 2020 Mar 1;36(5):1391-1396. doi: 10.1093/bioinformatics/btz757.
3
iMPTCE-Hnetwork: A Multilabel Classifier for Identifying Metabolic Pathway Types of Chemicals and Enzymes with a Heterogeneous Network.iMPTCE-Hnetwork:一种基于异构网络的用于识别化学物质和酶代谢途径类型的多标签分类器。
Comput Math Methods Med. 2021 Jan 4;2021:6683051. doi: 10.1155/2021/6683051. eCollection 2021.
4
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
5
iMPT-FDNPL: Identification of Membrane Protein Types with Functional Domains and a Natural Language Processing Approach.iMPT-FDNPL:基于功能域和自然语言处理方法识别膜蛋白类型。
Comput Math Methods Med. 2021 Oct 11;2021:7681497. doi: 10.1155/2021/7681497. eCollection 2021.
6
Improved multi-label classifiers for predicting protein subcellular localization.改进的多标签分类器用于预测蛋白质亚细胞定位。
Math Biosci Eng. 2024 Jan;21(1):214-236. doi: 10.3934/mbe.2024010. Epub 2022 Dec 11.
7
ECS: an automatic enzyme classifier based on functional domain composition.ECS:一种基于功能域组成的自动酶分类器。
Comput Biol Chem. 2007 Jun;31(3):226-32. doi: 10.1016/j.compbiolchem.2007.03.008. Epub 2007 Mar 30.
8
deepNEC: a novel alignment-free tool for the identification and classification of nitrogen biochemical network-related enzymes using deep learning.深度 NEC:一种新颖的无对齐工具,用于使用深度学习识别和分类与氮生化网络相关的酶。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac071.
9
Identification of Multi-Functional Enzyme with Multi-Label Classifier.使用多标签分类器识别多功能酶
PLoS One. 2016 Apr 14;11(4):e0153503. doi: 10.1371/journal.pone.0153503. eCollection 2016.
10
TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection.TSG:一种用于二分类和多分类癌症分类及信息基因选择的新算法。
BMC Med Genomics. 2013;6 Suppl 1(Suppl 1):S3. doi: 10.1186/1755-8794-6-S1-S3. Epub 2013 Jan 23.

引用本文的文献

1
Machine learning approaches reveal methylation signatures associated with pediatric acute myeloid leukemia recurrence.机器学习方法揭示了与小儿急性髓系白血病复发相关的甲基化特征。
Sci Rep. 2025 May 6;15(1):15815. doi: 10.1038/s41598-025-99258-4.
2
Herb-disease association prediction model based on network consistency projection.基于网络一致性投影的草药-疾病关联预测模型
Sci Rep. 2025 Jan 27;15(1):3328. doi: 10.1038/s41598-025-87521-7.
3
CMAGN: circRNA-miRNA association prediction based on graph attention auto-encoder and network consistency projection.

本文引用的文献

1
Improved multi-label classifiers for predicting protein subcellular localization.改进的多标签分类器用于预测蛋白质亚细胞定位。
Math Biosci Eng. 2024 Jan;21(1):214-236. doi: 10.3934/mbe.2024010. Epub 2022 Dec 11.
2
RMTLysPTM: recognizing multiple types of lysine PTM sites by deep analysis on sequences.RMTLysPTM:通过对序列进行深度分析来识别多种类型的赖氨酸翻译后修饰位点
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad450.
3
Inferring drug-disease associations by a deep analysis on drug and disease networks.通过对药物和疾病网络的深入分析来推断药物-疾病关联。
CMAGN:基于图注意自动编码器和网络一致性投影的 circRNA-miRNA 关联预测。
BMC Bioinformatics. 2024 Oct 24;25(1):336. doi: 10.1186/s12859-024-05959-4.
4
Machine Learning in Identifying Marker Genes for Congenital Heart Diseases of Different Cardiac Cell Types.机器学习在识别不同心脏细胞类型先天性心脏病的标记基因中的应用
Life (Basel). 2024 Aug 19;14(8):1032. doi: 10.3390/life14081032.
5
PMiSLocMF: predicting miRNA subcellular localizations by incorporating multi-source features of miRNAs.PMiSLocMF:通过整合 miRNA 的多源特征来预测 miRNA 的亚细胞定位。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae386.
6
GloEC: a hierarchical-aware global model for predicting enzyme function.GloEC:一种用于预测酶功能的层次感知全局模型。
Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae365.
7
Machine learning based method for analyzing vibration and noise in large cruise ships.基于机器学习的大型游轮振动与噪声分析方法。
PLoS One. 2024 Jul 25;19(7):e0307835. doi: 10.1371/journal.pone.0307835. eCollection 2024.
8
Machine Learning Reveals Impacts of Smoking on Gene Profiles of Different Cell Types in Lung.机器学习揭示吸烟对肺中不同细胞类型基因谱的影响。
Life (Basel). 2024 Apr 13;14(4):502. doi: 10.3390/life14040502.
Math Biosci Eng. 2023 Jun 26;20(8):14136-14157. doi: 10.3934/mbe.2023632.
4
Identification of Genes Associated with the Impairment of Olfactory and Gustatory Functions in COVID-19 via Machine-Learning Methods.通过机器学习方法鉴定与COVID-19嗅觉和味觉功能受损相关的基因
Life (Basel). 2023 Mar 15;13(3):798. doi: 10.3390/life13030798.
5
A model with deep analysis on a large drug network for drug classification.一种用于药物分类的对大型药物网络进行深度分析的模型。
Math Biosci Eng. 2023 Jan;20(1):383-401. doi: 10.3934/mbe.2023018. Epub 2022 Oct 9.
6
Identification of Smoking-Associated Transcriptome Aberration in Blood with Machine Learning Methods.利用机器学习方法鉴定血液中与吸烟相关的转录组异常。
Biomed Res Int. 2023 Jan 4;2023:5333361. doi: 10.1155/2023/5333361. eCollection 2023.
7
Analysis and prediction of protein stability based on interaction network, gene ontology, and KEGG pathway enrichment scores.基于相互作用网络、基因本体和KEGG通路富集分数的蛋白质稳定性分析与预测。
Biochim Biophys Acta Proteins Proteom. 2023 May 1;1871(3):140889. doi: 10.1016/j.bbapap.2023.140889. Epub 2023 Jan 4.
8
iMPT-FDNPL: Identification of Membrane Protein Types with Functional Domains and a Natural Language Processing Approach.iMPT-FDNPL:基于功能域和自然语言处理方法识别膜蛋白类型。
Comput Math Methods Med. 2021 Oct 11;2021:7681497. doi: 10.1155/2021/7681497. eCollection 2021.
9
BENZ WS: the Bologna ENZyme Web Server for four-level EC number annotation.BENZ WS:用于四级 EC 编号注释的博洛尼亚酶网络服务器。
Nucleic Acids Res. 2021 Jul 2;49(W1):W60-W66. doi: 10.1093/nar/gkab328.
10
The InterPro protein families and domains database: 20 years on.The InterPro 蛋白质家族和结构域数据库:20 年的发展历程。
Nucleic Acids Res. 2021 Jan 8;49(D1):D344-D354. doi: 10.1093/nar/gkaa977.