一种强大的集成分类方法分析。

A robust ensemble classification method analysis.

机构信息

Department of Mathematics and Computing, University of Southern Queensland, Toowoomba, QLD, Australia.

出版信息

Adv Exp Med Biol. 2010;680:149-55. doi: 10.1007/978-1-4419-5913-3_17.

PMID:20865496

Abstract

Apart from the dimensionality problem, the uncertainty of Microarray data quality is another major challenge of Microarray classification. Microarray data contain various levels of noise and quite often high levels of noise, and these data lead to unreliable and low accuracy analysis as well as high dimensionality problem. In this paper, we propose a new Microarray data classification method, based on diversified multiple trees. The new method contains features that (1) make most use of the information from the abundant genes in the Microarray data and (2) use a unique diversity measurement in the ensemble decision committee. The experimental results show that the proposed classification method (DMDT) and the well-known method (CS4), which diversifies trees by using distinct tree roots, are more accurate on average than other well-known ensemble methods, including Bagging, Boosting, and Random Forests. The experiments also indicate that using diversity measurement of DMDT improves the classification accuracy of ensemble classification on Microarray data.

摘要

除了维度问题，微阵列数据质量的不确定性是微阵列分类的另一个主要挑战。微阵列数据包含各种级别的噪声，而且经常是高水平的噪声，这些数据导致不可靠和低准确性的分析以及高维度问题。在本文中，我们提出了一种新的微阵列数据分类方法，基于多样化的多棵树。新方法具有以下特点：（1）最大限度地利用微阵列数据中丰富基因的信息；（2）在集成决策委员会中使用独特的多样性度量。实验结果表明，所提出的分类方法（DMDT）和著名的方法（CS4），通过使用不同的树根来多样化树，比其他著名的集成方法，包括 Bagging、Boosting 和 Random Forests，平均更准确。实验还表明，使用 DMDT 的多样性度量可以提高集成分类在微阵列数据上的分类准确性。