Sun Lin, Xu Jiucheng, Yin Ying
College of Computer and Information Engineering, Henan Normal University, Xinxiang, China.
Engineering Technology Research Center for Computing Intelligence and Data Mining, Henan Province, China.
Biomed Mater Eng. 2015;26 Suppl 1:S2011-7. doi: 10.3233/BME-151505.
One of the important problems in microarray gene expression data is tumor classification. This paper proposes a new feature selection method for tumor classification using gene expression data. In this method, three dimensionality reduction methods, including principal component analysis (PCA), factor analysis (FA) and independent component analysis (ICA), are first introduced to extract and select features for tumor classification, and their corresponding specific steps are given respectively. Then, the superiority of three algorithms is demonstrated by performing experimental comparisons on acute leukemia data sets. It is concluded that PCA compared with FA and ICA is the best under feature load ratio. However, PCA cannot make full use of the category information. To overcome the weak point, Fisher linear discriminant (FLD) is employed as those components of PCA, and a new approach to principal component discriminant analysis (PCDA) is proposed to retain all assets and work better than both PCA and FLD for classification. The further experimental results show that the classification ability of selected feature subsets by means of PCDA is higher than that of the other related dimensionality reduction methods, and the proposed algorithm is efficient and feasible for tumor classification.
微阵列基因表达数据中的一个重要问题是肿瘤分类。本文提出了一种利用基因表达数据进行肿瘤分类的新特征选择方法。在该方法中,首先引入主成分分析(PCA)、因子分析(FA)和独立成分分析(ICA)这三种降维方法来提取和选择用于肿瘤分类的特征,并分别给出了它们相应的具体步骤。然后,通过对急性白血病数据集进行实验比较,证明了这三种算法的优越性。得出结论:在特征负荷比方面,PCA优于FA和ICA。然而,PCA不能充分利用类别信息。为克服这一弱点,将Fisher线性判别(FLD)用作PCA的那些成分,并提出了一种主成分判别分析(PCDA)的新方法,该方法能保留所有优点,且在分类方面比PCA和FLD都表现更好。进一步的实验结果表明,通过PCDA选择的特征子集的分类能力高于其他相关降维方法,且所提出的算法对于肿瘤分类是高效且可行的。