Department of Biostatistics, Johns Hopkins School of Public Health, Baltimore, MD 21205, USA.
Stat Methods Med Res. 2009 Dec;18(6):565-75. doi: 10.1177/0962280209351908.
Among the many applications of microarray technology, one of the most popular is the identification of genes that are differentially expressed in two conditions. A common statistical approach is to quantify the interest of each gene with a p-value, adjust these p-values for multiple comparisons, choose an appropriate cut-off, and create a list of candidate genes. This approach has been criticised for ignoring biological knowledge regarding how genes work together. Recently a series of methods, that do incorporate biological knowledge, have been proposed. However, the most popular method, gene set enrichment analysis (GSEA), seems overly complicated. Furthermore, GSEA is based on a statistical test known for its lack of sensitivity. In this article we compare the performance of a simple alternative to GSEA. We find that this simple solution clearly outperforms GSEA. We demonstrate this with eight different microarray datasets.
在微阵列技术的众多应用中,其中一个最受欢迎的是识别在两种条件下差异表达的基因。一种常见的统计方法是用 p 值量化每个基因的兴趣,对这些 p 值进行多次比较调整,选择一个合适的截止值,并创建候选基因列表。这种方法受到了批评,因为它忽略了关于基因如何协同工作的生物学知识。最近,提出了一系列确实包含生物学知识的方法。然而,最流行的方法,基因集富集分析(GSEA),似乎过于复杂。此外,GSEA 基于一种统计检验,该检验因其缺乏敏感性而受到批评。在本文中,我们比较了 GSEA 的一种简单替代方法的性能。我们发现这个简单的解决方案明显优于 GSEA。我们用八个不同的微阵列数据集证明了这一点。