Li Jie, Tang Xianglong
Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.
Comput Biol Med. 2007 Nov;37(11):1637-46. doi: 10.1016/j.compbiomed.2007.03.004. Epub 2007 May 7.
Classifiers have been widely used to select an optimal subset of feature genes from microarray data for accurate classification of cancer samples and cancer-related studies. However, the classification rules derived from most classifiers are complex and difficult to understand in biological significance. How to solve this problem is a new challenge. In this paper, a new classification model based on gene pair is proposed to address the problem. The experimental results on several microarray data demonstrate that the proposed classification model performs well in finding a large number of excellent feature gene pairs. A 100% LOOCV classification accuracy can be achieved using a single classification model based on optimal feature gene pair or combining multiple top-ranked classification models. Using the proposed method, we successfully identified important cancer-related genes that had been validated in previous biological studies while they were not discovered by the other methods.
分类器已被广泛用于从微阵列数据中选择特征基因的最优子集,以对癌症样本进行准确分类及开展癌症相关研究。然而,大多数分类器得出的分类规则复杂,且在生物学意义上难以理解。如何解决这一问题是一项新挑战。本文提出了一种基于基因对的新分类模型来解决该问题。在多个微阵列数据上的实验结果表明,所提出的分类模型在找到大量优秀特征基因对方面表现良好。使用基于最优特征基因对的单个分类模型或组合多个排名靠前的分类模型可实现100%的留一法交叉验证分类准确率。使用所提出的方法,我们成功识别出了在先前生物学研究中已得到验证但未被其他方法发现的重要癌症相关基因。