Department of Mathematics, College of Basic Medical Sciences, China Medical University, and Computer Center, Affiliated Shenjing Hospital, Shenyang, China.
J Exp Clin Cancer Res. 2009 Dec 10;28(1):149. doi: 10.1186/1756-9966-28-149.
More studies based on gene expression data have been reported in great detail, however, one major challenge for the methodologists is the choice of classification methods. The main purpose of this research was to compare the performance of linear discriminant analysis (LDA) and its modification methods for the classification of cancer based on gene expression data.
The classification performance of linear discriminant analysis (LDA) and its modification methods was evaluated by applying these methods to six public cancer gene expression datasets. These methods included linear discriminant analysis (LDA), prediction analysis for microarrays (PAM), shrinkage centroid regularized discriminant analysis (SCRDA), shrinkage linear discriminant analysis (SLDA) and shrinkage diagonal discriminant analysis (SDDA). The procedures were performed by software R 2.80.
PAM picked out fewer feature genes than other methods from most datasets except from Brain dataset. For the two methods of shrinkage discriminant analysis, SLDA selected more genes than SDDA from most datasets except from 2-class lung cancer dataset. When comparing SLDA with SCRDA, SLDA selected more genes than SCRDA from 2-class lung cancer, SRBCT and Brain dataset, the result was opposite for the rest datasets. The average test error of LDA modification methods was lower than LDA method.
The classification performance of LDA modification methods was superior to that of traditional LDA with respect to the average error and there was no significant difference between theses modification methods.
更多基于基因表达数据的研究已经被详细报道,然而,方法学家面临的一个主要挑战是分类方法的选择。本研究的主要目的是比较线性判别分析(LDA)及其修改方法在基于基因表达数据的癌症分类中的性能。
通过将这些方法应用于六个公共癌症基因表达数据集,评估了线性判别分析(LDA)及其修改方法的分类性能。这些方法包括线性判别分析(LDA)、微阵列预测分析(PAM)、收缩质心正则化判别分析(SCRDA)、收缩线性判别分析(SLDA)和收缩对角判别分析(SDDA)。这些程序由软件 R 2.80 执行。
PAM 从大多数数据集(除了 Brain 数据集)中挑选的特征基因比其他方法都要少。对于两种收缩判别分析方法,SLDA 从大多数数据集(除了 2 类肺癌数据集)中挑选的基因比 SDDA 多。当将 SLDA 与 SCRDA 进行比较时,SLDA 从 2 类肺癌、SRBCT 和 Brain 数据集选择的基因比 SCRDA 多,而对于其余数据集,结果则相反。LDA 修正方法的平均测试误差低于 LDA 方法。
就平均误差而言,LDA 修正方法的分类性能优于传统的 LDA,这些修正方法之间没有显著差异。