Suppr超能文献

假设基因独立性的简单基因集富集分析的局限性。

The limitations of simple gene set enrichment analysis assuming gene independence.

作者信息

Tamayo Pablo, Steinhardt George, Liberzon Arthur, Mesirov Jill P

机构信息

The Eli and Edythe L. Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA, USA

Boston University Bioinformatics Program, Boston University, Boston, MA, USA.

出版信息

Stat Methods Med Res. 2016 Feb;25(1):472-87. doi: 10.1177/0962280212460441. Epub 2012 Oct 14.

Abstract

Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes Gene Set Enrichment Analysis's nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with Gene Set Enrichment Analysis's on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene-gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods.

摘要

自2003年首次发表以来,基于柯尔莫哥洛夫-斯米尔诺夫统计量的基因集富集分析方法得到了大量应用、改进,同时也受到了质疑。最近,Irizarry等人于2009年提出了一种简化方法,该方法使用单样本t检验分数来评估富集情况,并且忽略基因-基因相关性,被视为一种有力的竞争方法。该观点批评基因集富集分析的非参数性质及其使用经验性零分布是不必要的且难以计算。我们通过仔细考虑简化方法的假设及其结果,包括与基因集富集分析在50个数据集的大型基准集上进行比较,反驳了这些说法。我们的结果提供了强有力的经验证据,表明基因-基因相关性不能被忽略,因为它们会在富集分数上产生显著的方差膨胀,并且在估计基因集富集显著性时应予以考虑。此外,我们还讨论了基因集的复杂相关结构和多模态给基因集富集方法带来的更普遍挑战。

相似文献

2
An alternative model of type A dependence in a gene set of correlated genes.相关基因集中A型依赖性的另一种模型。
Stat Appl Genet Mol Biol. 2010;9:Article 12. doi: 10.2202/1544-6115.1525. Epub 2010 Jan 26.
5
A multivariate extension of the gene set enrichment analysis.基因集富集分析的多元扩展。
J Bioinform Comput Biol. 2007 Oct;5(5):1139-53. doi: 10.1142/s0219720007003041.
6
Gene set enrichment analysis made simple.基因集富集分析变得简单。
Stat Methods Med Res. 2009 Dec;18(6):565-75. doi: 10.1177/0962280209351908.
7
A modified F-test for hypothesis testing in large-scale data.一种用于大规模数据假设检验的修正F检验。
J Biopharm Stat. 2018;28(6):1078-1089. doi: 10.1080/10543406.2018.1436557. Epub 2018 Feb 12.

引用本文的文献

10

本文引用的文献

6
De-correlating expression in gene-set analysis.基因集分析中的去相关表达。
Bioinformatics. 2010 Sep 15;26(18):i511-6. doi: 10.1093/bioinformatics/btq380.
7
ROAST: rotation gene set tests for complex microarray experiments.ROAST:用于复杂微阵列实验的旋转基因集检验。
Bioinformatics. 2010 Sep 1;26(17):2176-82. doi: 10.1093/bioinformatics/btq401. Epub 2010 Jul 7.
9
MYC regulation of a "poor-prognosis" metastatic cancer cell state.MYC 调控“预后不良”转移性癌细胞状态。
Proc Natl Acad Sci U S A. 2010 Feb 23;107(8):3698-703. doi: 10.1073/pnas.0914203107. Epub 2010 Feb 4.
10
Gene set enrichment analysis made simple.基因集富集分析变得简单。
Stat Methods Med Res. 2009 Dec;18(6):565-75. doi: 10.1177/0962280209351908.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验