Suppr超能文献

基于非对齐特征的酶/非酶分类集成方法

Non-Alignment Features Based Enzyme/Non-Enzyme Classification Using an Ensemble Method.

作者信息

Davidson Nicholas J, Wang Xueyi

机构信息

Department of Mathematics, Boise State University, Boise, ID USA.

出版信息

Proc Int Conf Mach Learn Appl. 2010 Dec 12:546-551. doi: 10.1109/ICMLA.2010.167.

Abstract

As a growing number of protein structures are resolved without known functions, using computational methods to help predict protein functions from the structures becomes more and more important. Some computational methods predict protein functions by aligning to homologous proteins with known functions, but they fail to work if such homology cannot be identified. In this paper we classify enzymes/non-enzymes using non-alignment features. We propose a new ensemble method that includes three support vector machines (SVM) and two k-nearest neighbor algorithms (k-NN) and uses a simple majority voting rule. The test on a data set of 697 enzymes and 480 non-enzymes adapted from Dobson and Doig shows 85.59% accuracy in a 10-fold cross validation and 86.49% accuracy in a leave-one-out validation. The prediction accuracy is much better than other non-alignment features based methods and even slightly better than alignment features based methods. To our knowledge, our method is the first time to use ensemble methods to classify enzymes/non-enzymes and is superior over a single classifier.

摘要

随着越来越多的蛋白质结构在功能未知的情况下得到解析,使用计算方法从结构预测蛋白质功能变得越来越重要。一些计算方法通过与已知功能的同源蛋白质比对来预测蛋白质功能,但如果无法识别这种同源性,它们就无法发挥作用。在本文中,我们使用非比对特征对酶/非酶进行分类。我们提出了一种新的集成方法,该方法包括三个支持向量机(SVM)和两个k近邻算法(k-NN),并使用简单的多数投票规则。对从多布森和多伊格改编的697种酶和480种非酶的数据集进行测试,在10折交叉验证中准确率为85.59%,在留一法验证中准确率为86.49%。预测准确率比其他基于非比对特征的方法要好得多,甚至比基于比对特征的方法略好。据我们所知,我们的方法是首次使用集成方法对酶/非酶进行分类,并且优于单个分类器。

相似文献

6
Mixture classification model based on clinical markers for breast cancer prognosis.基于临床标志物的乳腺癌预后混合分类模型。
Artif Intell Med. 2010 Feb-Mar;48(2-3):129-37. doi: 10.1016/j.artmed.2009.07.008. Epub 2009 Dec 14.
10
Random ensemble learning for EEG classification.随机集成学习在脑电分类中的应用。
Artif Intell Med. 2018 Jan;84:146-158. doi: 10.1016/j.artmed.2017.12.004. Epub 2018 Jan 3.

本文引用的文献

2
Sequence context-specific profiles for homology searching.用于同源性搜索的序列上下文特定概况。
Proc Natl Acad Sci U S A. 2009 Mar 10;106(10):3770-5. doi: 10.1073/pnas.0810767106. Epub 2009 Feb 20.
5
Searching protein structure databases with DaliLite v.3.使用DaliLite v.3搜索蛋白质结构数据库。
Bioinformatics. 2008 Dec 1;24(23):2780-1. doi: 10.1093/bioinformatics/btn507. Epub 2008 Sep 25.
7
ECS: an automatic enzyme classifier based on functional domain composition.ECS:一种基于功能域组成的自动酶分类器。
Comput Biol Chem. 2007 Jun;31(3):226-32. doi: 10.1016/j.compbiolchem.2007.03.008. Epub 2007 Mar 30.
10
Protein function prediction via graph kernels.通过图核进行蛋白质功能预测。
Bioinformatics. 2005 Jun;21 Suppl 1:i47-56. doi: 10.1093/bioinformatics/bti1007.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验