基于非对齐特征的酶/非酶分类集成方法

Non-Alignment Features Based Enzyme/Non-Enzyme Classification Using an Ensemble Method.

作者信息

Davidson Nicholas J, Wang Xueyi

机构信息

Department of Mathematics, Boise State University, Boise, ID USA.

出版信息

Proc Int Conf Mach Learn Appl. 2010 Dec 12:546-551. doi: 10.1109/ICMLA.2010.167.

DOI:10.1109/ICMLA.2010.167

PMID:21572553

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3091888/

Abstract

As a growing number of protein structures are resolved without known functions, using computational methods to help predict protein functions from the structures becomes more and more important. Some computational methods predict protein functions by aligning to homologous proteins with known functions, but they fail to work if such homology cannot be identified. In this paper we classify enzymes/non-enzymes using non-alignment features. We propose a new ensemble method that includes three support vector machines (SVM) and two k-nearest neighbor algorithms (k-NN) and uses a simple majority voting rule. The test on a data set of 697 enzymes and 480 non-enzymes adapted from Dobson and Doig shows 85.59% accuracy in a 10-fold cross validation and 86.49% accuracy in a leave-one-out validation. The prediction accuracy is much better than other non-alignment features based methods and even slightly better than alignment features based methods. To our knowledge, our method is the first time to use ensemble methods to classify enzymes/non-enzymes and is superior over a single classifier.

摘要

随着越来越多的蛋白质结构在功能未知的情况下得到解析，使用计算方法从结构预测蛋白质功能变得越来越重要。一些计算方法通过与已知功能的同源蛋白质比对来预测蛋白质功能，但如果无法识别这种同源性，它们就无法发挥作用。在本文中，我们使用非比对特征对酶/非酶进行分类。我们提出了一种新的集成方法，该方法包括三个支持向量机（SVM）和两个k近邻算法（k-NN），并使用简单的多数投票规则。对从多布森和多伊格改编的697种酶和480种非酶的数据集进行测试，在10折交叉验证中准确率为85.59%，在留一法验证中准确率为86.49%。预测准确率比其他基于非比对特征的方法要好得多，甚至比基于比对特征的方法略好。据我们所知，我们的方法是首次使用集成方法对酶/非酶进行分类，并且优于单个分类器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06f7/3091888/9676dac96144/nihms266865f1.jpg

相似文献

Non-Alignment Features Based Enzyme/Non-Enzyme Classification Using an Ensemble Method.基于非对齐特征的酶/非酶分类集成方法

Proc Int Conf Mach Learn Appl. 2010 Dec 12:546-551. doi: 10.1109/ICMLA.2010.167.

deepNEC: a novel alignment-free tool for the identification and classification of nitrogen biochemical network-related enzymes using deep learning.深度 NEC：一种新颖的无对齐工具，用于使用深度学习识别和分类与氮生化网络相关的酶。

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac071.

Profiles and majority voting-based ensemble method for protein secondary structure prediction.基于轮廓和多数投票的蛋白质二级结构预测集成方法。

Evol Bioinform Online. 2011;7:171-89. doi: 10.4137/EBO.S7931. Epub 2011 Oct 10.

Computer-assisted lip diagnosis on Traditional Chinese Medicine using multi-class support vector machines.基于多类支持向量机的中医唇诊计算机辅助诊断。

BMC Complement Altern Med. 2012 Aug 16;12:127. doi: 10.1186/1472-6882-12-127.

Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method.使用自适应模糊k近邻法准确预测酶亚家族类别。

Biosystems. 2007 Sep-Oct;90(2):405-13. doi: 10.1016/j.biosystems.2006.10.004. Epub 2006 Oct 26.

Mixture classification model based on clinical markers for breast cancer prognosis.基于临床标志物的乳腺癌预后混合分类模型。

Artif Intell Med. 2010 Feb-Mar;48(2-3):129-37. doi: 10.1016/j.artmed.2009.07.008. Epub 2009 Dec 14.

EnANNDeep: An Ensemble-based lncRNA-protein Interaction Prediction Framework with Adaptive k-Nearest Neighbor Classifier and Deep Models.EnANNDeep：基于集成学习的 lncRNA-蛋白质相互作用预测框架，采用自适应 k-最近邻分类器和深度模型。

Interdiscip Sci. 2022 Mar;14(1):209-232. doi: 10.1007/s12539-021-00483-y. Epub 2022 Jan 10.

Ensemble Model for Diagnostic Classification of Alzheimer's Disease Based on Brain Anatomical Magnetic Resonance Imaging.基于脑解剖磁共振成像的阿尔茨海默病诊断分类集成模型

Diagnostics (Basel). 2022 Dec 16;12(12):3193. doi: 10.3390/diagnostics12123193.

Computational identification of ubiquitylation sites from protein sequences.从蛋白质序列中通过计算方法鉴定泛素化位点

BMC Bioinformatics. 2008 Jul 15;9:310. doi: 10.1186/1471-2105-9-310.

Random ensemble learning for EEG classification.随机集成学习在脑电分类中的应用。

Artif Intell Med. 2018 Jan;84:146-158. doi: 10.1016/j.artmed.2017.12.004. Epub 2018 Jan 3.

引用本文的文献

Alignment-Free Method to Predict Enzyme Classes and Subclasses.无比对方法预测酶类和亚类。

Int J Mol Sci. 2019 Oct 29;20(21):5389. doi: 10.3390/ijms20215389.

ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature.ECPred：一种基于 EC 命名法预测蛋白质序列酶功能的工具。

BMC Bioinformatics. 2018 Sep 21;19(1):334. doi: 10.1186/s12859-018-2368-y.

CLIPS-1D: analysis of multiple sequence alignments to deduce for residue-positions a role in catalysis, ligand-binding, or protein structure.CLIPS-1D：分析多重序列比对，推断残基位置在催化、配体结合或蛋白质结构中的作用。

BMC Bioinformatics. 2012 Apr 5;13:55. doi: 10.1186/1471-2105-13-55.

本文引用的文献

PSI-2: structural genomics to cover protein domain family space.PSI-2：用于覆盖蛋白质结构域家族空间的结构基因组学。

Structure. 2009 Jun 10;17(6):869-81. doi: 10.1016/j.str.2009.03.015.

Sequence context-specific profiles for homology searching.用于同源性搜索的序列上下文特定概况。

Proc Natl Acad Sci U S A. 2009 Mar 10;106(10):3770-5. doi: 10.1073/pnas.0810767106. Epub 2009 Feb 20.

Computational chemistry study of 3D-structure-function relationships for enzymes based on Markov models for protein electrostatic, HINT, and van der Waals potentials.基于蛋白质静电、HINT和范德华力势的马尔可夫模型对酶的三维结构-功能关系的计算化学研究。

J Comput Chem. 2009 Jul 15;30(9):1510-20. doi: 10.1002/jcc.21170.

Minimotif miner 2nd release: a database and web system for motif search.微型基序挖掘器第二版：一个用于基序搜索的数据库和网络系统。

Nucleic Acids Res. 2009 Jan;37(Database issue):D185-90. doi: 10.1093/nar/gkn865. Epub 2008 Oct 31.

Searching protein structure databases with DaliLite v.3.使用DaliLite v.3搜索蛋白质结构数据库。

Bioinformatics. 2008 Dec 1;24(23):2780-1. doi: 10.1093/bioinformatics/btn507. Epub 2008 Sep 25.

Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices.基于组成、序列、三维结构和拓扑指数的酶/非酶分类模型复杂性

J Theor Biol. 2008 Sep 21;254(2):476-82. doi: 10.1016/j.jtbi.2008.06.003. Epub 2008 Jun 14.

ECS: an automatic enzyme classifier based on functional domain composition.ECS：一种基于功能域组成的自动酶分类器。

Comput Biol Chem. 2007 Jun;31(3):226-32. doi: 10.1016/j.compbiolchem.2007.03.008. Epub 2007 Mar 30.

ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins.ScanProsite：检测蛋白质中PROSITE特征匹配以及与ProRule相关的功能和结构残基。

Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W362-5. doi: 10.1093/nar/gkl124.

The impact of structural genomics: expectations and outcomes.结构基因组学的影响：期望与成果

Science. 2006 Jan 20;311(5759):347-51. doi: 10.1126/science.1121018.

Protein function prediction via graph kernels.通过图核进行蛋白质功能预测。

Bioinformatics. 2005 Jun;21 Suppl 1:i47-56. doi: 10.1093/bioinformatics/bti1007.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验