基于决策树分类器的集成学习方法在蛋白质折叠识别中的比较

Comparing ensemble learning methods based on decision tree classifiers for protein fold recognition.

作者信息

Bardsiri Mahshid Khatibi, Eftekhari Mahdi

出版信息

Int J Data Min Bioinform. 2014;9(1):89-105. doi: 10.1504/ijdmb.2014.057776.

DOI:10.1504/ijdmb.2014.057776

Abstract

In this paper, some methods for ensemble learning of protein fold recognition based on a decision tree (DT) are compared and contrasted against each other over three datasets taken from the literature. According to previously reported studies, the features of the datasets are divided into some groups. Then, for each of these groups, three ensemble classifiers, namely, random forest, rotation forest and AdaBoost.M1 are employed. Also, some fusion methods are introduced for combining the ensemble classifiers obtained in the previous step. After this step, three classifiers are produced based on the combination of classifiers of types random forest, rotation forest and AdaBoost.M1. Finally, the three different classifiers achieved are combined to make an overall classifier. Experimental results show that the overall classifier obtained by the genetic algorithm (GA) weighting fusion method, is the best one in comparison to previously applied methods in terms of classification accuracy.

摘要

在本文中，基于决策树（DT）的蛋白质折叠识别集成学习的一些方法，在从文献中选取的三个数据集上相互进行了比较和对比。根据先前报道的研究，将数据集的特征分为若干组。然后，针对这些组中的每一组，使用了三种集成分类器，即随机森林、旋转森林和AdaBoost.M1。此外，还引入了一些融合方法来组合上一步获得的集成分类器。在此步骤之后，基于随机森林、旋转森林和AdaBoost.M1类型的分类器组合产生了三个分类器。最后，将得到的三个不同分类器进行组合，形成一个总体分类器。实验结果表明，在分类准确率方面，通过遗传算法（GA）加权融合方法获得的总体分类器，与先前应用的方法相比是最好的。