Davidson Nicholas J, Wang Xueyi
Department of Mathematics, Boise State University, Boise, ID USA.
Proc Int Conf Mach Learn Appl. 2010 Dec 12:546-551. doi: 10.1109/ICMLA.2010.167.
As a growing number of protein structures are resolved without known functions, using computational methods to help predict protein functions from the structures becomes more and more important. Some computational methods predict protein functions by aligning to homologous proteins with known functions, but they fail to work if such homology cannot be identified. In this paper we classify enzymes/non-enzymes using non-alignment features. We propose a new ensemble method that includes three support vector machines (SVM) and two k-nearest neighbor algorithms (k-NN) and uses a simple majority voting rule. The test on a data set of 697 enzymes and 480 non-enzymes adapted from Dobson and Doig shows 85.59% accuracy in a 10-fold cross validation and 86.49% accuracy in a leave-one-out validation. The prediction accuracy is much better than other non-alignment features based methods and even slightly better than alignment features based methods. To our knowledge, our method is the first time to use ensemble methods to classify enzymes/non-enzymes and is superior over a single classifier.
随着越来越多的蛋白质结构在功能未知的情况下得到解析,使用计算方法从结构预测蛋白质功能变得越来越重要。一些计算方法通过与已知功能的同源蛋白质比对来预测蛋白质功能,但如果无法识别这种同源性,它们就无法发挥作用。在本文中,我们使用非比对特征对酶/非酶进行分类。我们提出了一种新的集成方法,该方法包括三个支持向量机(SVM)和两个k近邻算法(k-NN),并使用简单的多数投票规则。对从多布森和多伊格改编的697种酶和480种非酶的数据集进行测试,在10折交叉验证中准确率为85.59%,在留一法验证中准确率为86.49%。预测准确率比其他基于非比对特征的方法要好得多,甚至比基于比对特征的方法略好。据我们所知,我们的方法是首次使用集成方法对酶/非酶进行分类,并且优于单个分类器。