• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种使用随机森林对酶功能类别和子类别进行分类的自上而下的方法。

A top-down approach to classify enzyme functional classes and sub-classes using random forest.

作者信息

Kumar Chetan, Choudhary Alok

机构信息

Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL 60201, USA.

出版信息

EURASIP J Bioinform Syst Biol. 2012 Feb 29;2012(1):1. doi: 10.1186/1687-4153-2012-1.

DOI:10.1186/1687-4153-2012-1
PMID:22376768
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3351021/
Abstract

Advancements in sequencing technologies have witnessed an exponential rise in the number of newly found enzymes. Enzymes are proteins that catalyze bio-chemical reactions and play an important role in metabolic pathways. Commonly, function of such enzymes is determined by experiments that can be time consuming and costly. Hence, a need for a computing method is felt that can distinguish protein enzyme sequences from those of non-enzymes and reliably predict the function of the former. To address this problem, approaches that cluster enzymes based on their sequence and structural similarity have been presented. But, these approaches are known to fail for proteins that perform the same function and are dissimilar in their sequence and structure. In this article, we present a supervised machine learning model to predict the function class and sub-class of enzymes based on a set of 73 sequence-derived features. The functional classes are as defined by International Union of Biochemistry and Molecular Biology. Using an efficient data mining algorithm called random forest, we construct a top-down three layer model where the top layer classifies a query protein sequence as an enzyme or non-enzyme, the second layer predicts the main function class and bottom layer further predicts the sub-function class. The model reported overall classification accuracy of 94.87% for the first level, 87.7% for the second, and 84.25% for the bottom level. Our results compare very well with existing methods, and in many cases report better performance. Using feature selection methods, we have shown the biological relevance of a few of the top rank attributes.

摘要

测序技术的进步见证了新发现酶的数量呈指数级增长。酶是催化生物化学反应并在代谢途径中发挥重要作用的蛋白质。通常,此类酶的功能是通过可能耗时且成本高昂的实验来确定的。因此,人们感到需要一种计算方法,该方法可以区分蛋白质酶序列和非酶序列,并可靠地预测前者的功能。为了解决这个问题,已经提出了基于酶的序列和结构相似性对其进行聚类的方法。但是,已知这些方法对于执行相同功能但序列和结构不同的蛋白质会失效。在本文中,我们提出了一种监督机器学习模型,用于基于一组73个源自序列的特征来预测酶的功能类别和子类别。功能类别由国际生物化学与分子生物学联盟定义。使用一种称为随机森林的高效数据挖掘算法,我们构建了一个自上而下的三层模型,其中顶层将查询蛋白质序列分类为酶或非酶,第二层预测主要功能类别,底层进一步预测子功能类别。该模型报告的第一级总体分类准确率为94.87%,第二级为87.7%,底层为84.25%。我们的结果与现有方法相比非常出色,并且在许多情况下表现更好。使用特征选择方法,我们已经展示了一些顶级属性的生物学相关性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/9ef5afa13688/1687-4153-2012-1-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/f38952ece4bb/1687-4153-2012-1-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/2a681489d705/1687-4153-2012-1-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/97e83406390a/1687-4153-2012-1-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/7d7d9b06677b/1687-4153-2012-1-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/813d1312115e/1687-4153-2012-1-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/68603917c9a1/1687-4153-2012-1-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/a79f6447f578/1687-4153-2012-1-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/c733b36a6d73/1687-4153-2012-1-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/234e57cc625d/1687-4153-2012-1-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/9ef5afa13688/1687-4153-2012-1-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/f38952ece4bb/1687-4153-2012-1-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/2a681489d705/1687-4153-2012-1-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/97e83406390a/1687-4153-2012-1-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/7d7d9b06677b/1687-4153-2012-1-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/813d1312115e/1687-4153-2012-1-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/68603917c9a1/1687-4153-2012-1-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/a79f6447f578/1687-4153-2012-1-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/c733b36a6d73/1687-4153-2012-1-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/234e57cc625d/1687-4153-2012-1-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/180b/3351021/9ef5afa13688/1687-4153-2012-1-10.jpg

相似文献

1
A top-down approach to classify enzyme functional classes and sub-classes using random forest.一种使用随机森林对酶功能类别和子类别进行分类的自上而下的方法。
EURASIP J Bioinform Syst Biol. 2012 Feb 29;2012(1):1. doi: 10.1186/1687-4153-2012-1.
2
Enzyme classification using multiclass support vector machine and feature subset selection.使用多类支持向量机和特征子集选择进行酶分类。
Comput Biol Chem. 2017 Oct;70:211-219. doi: 10.1016/j.compbiolchem.2017.08.009. Epub 2017 Aug 31.
3
EzyPred: a top-down approach for predicting enzyme functional classes and subclasses.EzyPred:一种用于预测酶功能类别和亚类的自上而下方法。
Biochem Biophys Res Commun. 2007 Dec 7;364(1):53-9. doi: 10.1016/j.bbrc.2007.09.098. Epub 2007 Oct 2.
4
ENZPRED-enzymatic protein class predicting by machine learning.ENZPRED-基于机器学习的酶蛋白分类预测。
Curr Top Med Chem. 2013;13(14):1674-80. doi: 10.2174/15680266113139990118.
5
Computational Approaches for Automated Classification of Enzyme Sequences.酶序列自动分类的计算方法
J Proteomics Bioinform. 2011 Aug 23;4:147-152. doi: 10.4172/jpb.1000183.
6
Alignment-Free Method to Predict Enzyme Classes and Subclasses.无比对方法预测酶类和亚类。
Int J Mol Sci. 2019 Oct 29;20(21):5389. doi: 10.3390/ijms20215389.
7
Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。
Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.
8
Maximizing lipocalin prediction through balanced and diversified training set and decision fusion.通过平衡且多样化的训练集和决策融合实现脂蛋白预测最大化。
Comput Biol Chem. 2015 Dec;59 Pt A:101-10. doi: 10.1016/j.compbiolchem.2015.09.011. Epub 2015 Sep 28.
9
Interpretability and Class Imbalance in Prediction Models for Pain Volatility in Manage My Pain App Users: Analysis Using Feature Selection and Majority Voting Methods.“管理我的疼痛”应用程序用户疼痛波动预测模型中的可解释性与类别不平衡:使用特征选择和多数投票方法的分析
JMIR Med Inform. 2019 Nov 20;7(4):e15601. doi: 10.2196/15601.
10
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

引用本文的文献

1
Machine Learning-Guided Protein Engineering.机器学习引导的蛋白质工程
ACS Catal. 2023 Oct 13;13(21):13863-13895. doi: 10.1021/acscatal.3c02743. eCollection 2023 Nov 3.
2
Enzyme Promiscuity Prediction Using Hierarchy-Informed Multi-Label Classification.基于层次信息的多标签分类的酶多功能性预测。
Bioinformatics. 2021 Aug 4;37(14):2017–2024. doi: 10.1093/bioinformatics/btab054. Epub 2021 Jan 30.
3
Alignment-Free Method to Predict Enzyme Classes and Subclasses.无比对方法预测酶类和亚类。

本文引用的文献

1
Enzyme function prediction with interpretable models.使用可解释模型进行酶功能预测。
Methods Mol Biol. 2009;541:373-420. doi: 10.1007/978-1-59745-243-4_17.
2
Analysis on conservation of disulphide bonds and their structural features in homologous protein domain families.同源蛋白质结构域家族中二硫键的保守性及其结构特征分析
BMC Struct Biol. 2008 Dec 26;8:55. doi: 10.1186/1472-6807-8-55.
3
Prediction of enzymes and non-enzymes from protein sequences based on sequence derived features and PSSM matrix using artificial neural network.
Int J Mol Sci. 2019 Oct 29;20(21):5389. doi: 10.3390/ijms20215389.
4
Prediction of Enzyme Function Based on Three Parallel Deep CNN and Amino Acid Mutation.基于三平行深度卷积神经网络和氨基酸突变预测酶功能。
Int J Mol Sci. 2019 Jun 11;20(11):2845. doi: 10.3390/ijms20112845.
5
Non-H3 CDR template selection in antibody modeling through machine learning.通过机器学习进行抗体建模时非H3互补决定区模板的选择
PeerJ. 2019 Jan 11;7:e6179. doi: 10.7717/peerj.6179. eCollection 2019.
6
ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature.ECPred:一种基于 EC 命名法预测蛋白质序列酶功能的工具。
BMC Bioinformatics. 2018 Sep 21;19(1):334. doi: 10.1186/s12859-018-2368-y.
7
EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation.EnzyNet:基于空间表示使用三维卷积神经网络进行酶分类
PeerJ. 2018 May 4;6:e4750. doi: 10.7717/peerj.4750. eCollection 2018.
8
DEEPre: sequence-based enzyme EC number prediction by deep learning.DEEPre:基于深度学习的酶 EC 号序列预测。
Bioinformatics. 2018 Mar 1;34(5):760-769. doi: 10.1093/bioinformatics/btx680.
9
Automatic single- and multi-label enzymatic function prediction by machine learning.通过机器学习实现自动单标签和多标签酶功能预测
PeerJ. 2017 Mar 29;5:e3095. doi: 10.7717/peerj.3095. eCollection 2017.
10
Isofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering.利用数据整合和谱聚类检测同功能蛋白亚家族
PLoS Comput Biol. 2016 Jun 27;12(6):e1005001. doi: 10.1371/journal.pcbi.1005001. eCollection 2016 Jun.
基于序列衍生特征和PSSM矩阵,利用人工神经网络从蛋白质序列预测酶和非酶。
Bioinformation. 2007 Dec 5;2(3):107-12. doi: 10.6026/97320630002107.
4
EzyPred: a top-down approach for predicting enzyme functional classes and subclasses.EzyPred:一种用于预测酶功能类别和亚类的自上而下方法。
Biochem Biophys Res Commun. 2007 Dec 7;364(1):53-9. doi: 10.1016/j.bbrc.2007.09.098. Epub 2007 Oct 2.
5
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.Cd-hit:一个用于对大量蛋白质或核苷酸序列进行聚类和比较的快速程序。
Bioinformatics. 2006 Jul 1;22(13):1658-9. doi: 10.1093/bioinformatics/btl158. Epub 2006 May 26.
6
Gene selection and classification of microarray data using random forest.使用随机森林进行微阵列数据的基因选择与分类
BMC Bioinformatics. 2006 Jan 6;7:3. doi: 10.1186/1471-2105-7-3.
7
Feature selection and the class imbalance problem in predicting protein function from sequence.从序列预测蛋白质功能中的特征选择与类不平衡问题。
Appl Bioinformatics. 2005;4(3):195-203. doi: 10.2165/00822942-200504030-00004.
8
Predicting functional family of novel enzymes irrespective of sequence similarity: a statistical learning approach.预测与序列相似性无关的新型酶的功能家族:一种统计学习方法。
Nucleic Acids Res. 2004 Dec 7;32(21):6437-44. doi: 10.1093/nar/gkh984. Print 2004.
9
Predicting enzyme class from protein structure without alignments.无需比对即可从蛋白质结构预测酶的类别。
J Mol Biol. 2005 Jan 7;345(1):187-99. doi: 10.1016/j.jmb.2004.10.024.
10
Protein function classification via support vector machine approach.基于支持向量机方法的蛋白质功能分类
Math Biosci. 2003 Oct;185(2):111-22. doi: 10.1016/s0025-5564(03)00096-8.