Shan Wen-Juan, Tong Chun-Fa, Shi Ji-Sen
The Key Laboratory of Forest Genetics and Gene Engineering of the State Administration and Jiangsu Province, Nanjing Forestry University, Nanjing 210037, China.
Yi Chuan. 2008 Dec;30(12):1640-6. doi: 10.3724/sp.j.1005.2008.01640.
DNA microarray is a new tool in biotechnology, which allows simultaneously monitoring thousands of gene expression in cells. The goal of differential gene expression analysis is to detect genes with significant change of gene expression levels arising from experimental conditions. Although various statistical methods have been suggested to confirm differential gene expression, only a few studies compared performance of the statistical methods. This paper presented comparison of statistical methods for finding differentially expressed genes (DEGs) from the microarray data. Using simulated and real datasets (Populus cDNA microarray data), we compared eight methods of identifying differential gene expression. The simulated datasets included four differential distributions (normal distribution, uniform distribution, c2 distribution, and exponential distribution). The results of simulated datasets analysis showed that the eight methods were more preferable with the microarray data of uniform distribution than normal distribution. They were not preferable with the c2 distribution and exponential distribution. Of these eight methods, SAM (Significance Analysis of Microarrays) and Wilcoxon rank sum test performed well in most cases. The results of real cDNA microarray data of Populus showed that there was much similarity of SAM, Samroc, and regression modeling approach. Wilcoxon rank sum test was different from them. Samroc and regression modeling approach were similar in the eight methods. For both simulated and real datasets, SAM, Samroc, and regression modeling approach performed better than other methods.
DNA微阵列是生物技术中的一种新工具,它能够同时监测细胞中数千个基因的表达。差异基因表达分析的目标是检测因实验条件而导致基因表达水平发生显著变化的基因。尽管已经提出了各种统计方法来确认差异基因表达,但只有少数研究比较了这些统计方法的性能。本文展示了从微阵列数据中寻找差异表达基因(DEG)的统计方法的比较。使用模拟数据集和真实数据集(杨树cDNA微阵列数据),我们比较了八种识别差异基因表达的方法。模拟数据集包括四种差异分布(正态分布、均匀分布、卡方分布和指数分布)。模拟数据集分析结果表明,对于均匀分布的微阵列数据,这八种方法比正态分布更适用。对于卡方分布和指数分布,它们并不适用。在这八种方法中,SAM(微阵列显著性分析)和威尔科克森秩和检验在大多数情况下表现良好。杨树真实cDNA微阵列数据的结果表明,SAM、Samroc和回归建模方法有很多相似之处。威尔科克森秩和检验与它们不同。在这八种方法中,Samroc和回归建模方法相似。对于模拟数据集和真实数据集,SAM、Samroc和回归建模方法的表现都优于其他方法。