Abedini Mani, Kirley Michael, Chiong Raymond
Department of Computing and Information Systems, The University of Melbourne, Victoria 3010, Australia ; IBM Research Australia, Carlton, Victoria 3053, Australia.
Australas Med J. 2013 May 30;6(5):272-9. doi: 10.4066/AMJ.2013.1641. Print 2013.
DNA microarray gene expression classification poses a challenging task to the machine learning domain. Typically, the dimensionality of gene expression data sets could go from several thousands to over 10,000 genes. A potential solution to this issue is using feature selection to reduce the dimensionality.
The aim of this paper is to investigate how we can use feature quality information to improve the precision of microarray gene expression classification tasks.
We propose two evolutionary machine learning models based on the eXtended Classifier System (XCS) and a typical feature selection methodology. The first one, which we call FS-XCS, uses feature selection for feature reduction purposes. The second model is GRD-XCS, which uses feature ranking to bias the rule discovery process of XCS.
The results indicate that the use of feature selection/ranking methods is essential for tackling highdimensional classification tasks, such as microarray gene expression classification. However, the results also suggest that using feature ranking to bias the rule discovery process performs significantly better than using the feature reduction method. In other words, using feature quality information to develop a smarter learning procedure is more efficient than reducing the feature set.
Our findings have shown that extracting feature quality information can assist the learning process and improve classification accuracy. On the other hand, relying exclusively on the feature quality information might potentially decrease the classification performance (e.g., using feature reduction). Therefore, we recommend a hybrid approach that uses feature quality information to direct the learning process by highlighting the more informative features, but at the same time not restricting the learning process to explore other features.
DNA微阵列基因表达分类对机器学习领域而言是一项具有挑战性的任务。通常,基因表达数据集的维度可能从数千个基因到超过10000个基因不等。解决这个问题的一个潜在方法是使用特征选择来降低维度。
本文的目的是研究如何利用特征质量信息来提高微阵列基因表达分类任务的精度。
我们基于扩展分类器系统(XCS)和一种典型的特征选择方法提出了两种进化机器学习模型。第一种我们称为FS-XCS,用于通过特征选择来减少特征数量。第二种模型是GRD-XCS,它使用特征排序来使XCS的规则发现过程产生偏差。
结果表明,使用特征选择/排序方法对于处理高维分类任务(如微阵列基因表达分类)至关重要。然而,结果也表明,使用特征排序来使规则发现过程产生偏差的效果明显优于使用特征约简方法。换句话说,利用特征质量信息来开发更智能的学习过程比减少特征集更有效。
我们的研究结果表明,提取特征质量信息可以辅助学习过程并提高分类准确率。另一方面,单纯依赖特征质量信息可能会降低分类性能(例如,使用特征约简)。因此,我们建议采用一种混合方法,即利用特征质量信息通过突出更具信息性的特征来指导学习过程,但同时不限制学习过程去探索其他特征。