• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

假设基因独立性的简单基因集富集分析的局限性。

The limitations of simple gene set enrichment analysis assuming gene independence.

作者信息

Tamayo Pablo, Steinhardt George, Liberzon Arthur, Mesirov Jill P

机构信息

The Eli and Edythe L. Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA, USA

Boston University Bioinformatics Program, Boston University, Boston, MA, USA.

出版信息

Stat Methods Med Res. 2016 Feb;25(1):472-87. doi: 10.1177/0962280212460441. Epub 2012 Oct 14.

DOI:10.1177/0962280212460441
PMID:23070592
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3758419/
Abstract

Since its first publication in 2003, the Gene Set Enrichment Analysis method, based on the Kolmogorov-Smirnov statistic, has been heavily used, modified, and also questioned. Recently a simplified approach using a one-sample t-test score to assess enrichment and ignoring gene-gene correlations was proposed by Irizarry et al. 2009 as a serious contender. The argument criticizes Gene Set Enrichment Analysis's nonparametric nature and its use of an empirical null distribution as unnecessary and hard to compute. We refute these claims by careful consideration of the assumptions of the simplified method and its results, including a comparison with Gene Set Enrichment Analysis's on a large benchmark set of 50 datasets. Our results provide strong empirical evidence that gene-gene correlations cannot be ignored due to the significant variance inflation they produced on the enrichment scores and should be taken into account when estimating gene set enrichment significance. In addition, we discuss the challenges that the complex correlation structure and multi-modality of gene sets pose more generally for gene set enrichment methods.

摘要

自2003年首次发表以来,基于柯尔莫哥洛夫-斯米尔诺夫统计量的基因集富集分析方法得到了大量应用、改进,同时也受到了质疑。最近,Irizarry等人于2009年提出了一种简化方法,该方法使用单样本t检验分数来评估富集情况,并且忽略基因-基因相关性,被视为一种有力的竞争方法。该观点批评基因集富集分析的非参数性质及其使用经验性零分布是不必要的且难以计算。我们通过仔细考虑简化方法的假设及其结果,包括与基因集富集分析在50个数据集的大型基准集上进行比较,反驳了这些说法。我们的结果提供了强有力的经验证据,表明基因-基因相关性不能被忽略,因为它们会在富集分数上产生显著的方差膨胀,并且在估计基因集富集显著性时应予以考虑。此外,我们还讨论了基因集的复杂相关结构和多模态给基因集富集方法带来的更普遍挑战。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/126d/3758419/25f5c6af8818/nihms466579f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/126d/3758419/60c227fb1d08/nihms466579f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/126d/3758419/e021be4931e2/nihms466579f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/126d/3758419/54b355b8f6c0/nihms466579f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/126d/3758419/9bfec04e8316/nihms466579f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/126d/3758419/94846d3d7307/nihms466579f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/126d/3758419/25f5c6af8818/nihms466579f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/126d/3758419/60c227fb1d08/nihms466579f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/126d/3758419/e021be4931e2/nihms466579f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/126d/3758419/54b355b8f6c0/nihms466579f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/126d/3758419/9bfec04e8316/nihms466579f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/126d/3758419/94846d3d7307/nihms466579f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/126d/3758419/25f5c6af8818/nihms466579f6.jpg

相似文献

1
The limitations of simple gene set enrichment analysis assuming gene independence.假设基因独立性的简单基因集富集分析的局限性。
Stat Methods Med Res. 2016 Feb;25(1):472-87. doi: 10.1177/0962280212460441. Epub 2012 Oct 14.
2
An alternative model of type A dependence in a gene set of correlated genes.相关基因集中A型依赖性的另一种模型。
Stat Appl Genet Mol Biol. 2010;9:Article 12. doi: 10.2202/1544-6115.1525. Epub 2010 Jan 26.
3
Meta-analysis approaches to combine multiple gene set enrichment studies.Meta 分析方法可用于整合多个基因集富集研究。
Stat Med. 2018 Feb 20;37(4):659-672. doi: 10.1002/sim.7540. Epub 2017 Oct 19.
4
Quality control of Platinum Spike dataset by probe-level mixed models.通过探针水平混合模型对铂标样数据集进行质量控制。
Math Biosci. 2014 Feb;248:1-10. doi: 10.1016/j.mbs.2013.11.004. Epub 2013 Dec 1.
5
A multivariate extension of the gene set enrichment analysis.基因集富集分析的多元扩展。
J Bioinform Comput Biol. 2007 Oct;5(5):1139-53. doi: 10.1142/s0219720007003041.
6
Gene set enrichment analysis made simple.基因集富集分析变得简单。
Stat Methods Med Res. 2009 Dec;18(6):565-75. doi: 10.1177/0962280209351908.
7
A modified F-test for hypothesis testing in large-scale data.一种用于大规模数据假设检验的修正F检验。
J Biopharm Stat. 2018;28(6):1078-1089. doi: 10.1080/10543406.2018.1436557. Epub 2018 Feb 12.
8
A framework for list representation, enabling list stabilization through incorporation of gene exchangeabilities.列表表示框架,通过纳入基因可交换性实现列表稳定化。
Biostatistics. 2012 Jan;13(1):129-41. doi: 10.1093/biostatistics/kxr023. Epub 2011 Sep 10.
9
Prior biological knowledge-based approaches for the analysis of genome-wide expression profiles using gene sets and pathways.基于先验生物学知识的方法,使用基因集和途径分析全基因组表达谱。
Stat Methods Med Res. 2009 Dec;18(6):577-93. doi: 10.1177/0962280209351925.
10
Putative null distributions corresponding to tests of differential expression in the Golden Spike dataset are intensity dependent.与金标准数据集差异表达检验相对应的假定零分布是强度依赖性的。
BMC Genomics. 2007 Apr 19;8:105. doi: 10.1186/1471-2164-8-105.

引用本文的文献

1
Gene expression signatures of response to fluoxetine treatment: systematic review and meta-analyses.氟西汀治疗反应的基因表达特征:系统评价与荟萃分析。
Mol Psychiatry. 2025 Jul 17. doi: 10.1038/s41380-025-03118-6.
2
Gene Set Enrichment Analysis in Zebrafish Embryos Is Susceptible to False-Positive Results in the Absence of Differentially Expressed Genes.在缺乏差异表达基因的情况下,斑马鱼胚胎中的基因集富集分析易出现假阳性结果。
Bioinform Biol Insights. 2025 Mar 4;19:11779322251321071. doi: 10.1177/11779322251321071. eCollection 2025.
3
Circadian Dysfunction in the Skeletal Muscle Impairs Limb Perfusion and Muscle Regeneration in Peripheral Artery Disease.

本文引用的文献

1
Analyzing the large number of variables in biomedical imagery: a brief review.分析生物医学图像中的大量变量:简要综述。
J Biopharm Stat. 2011 Nov;21(6):1094-9. doi: 10.1080/10543406.2011.607772.
2
Pathway analysis of expression data: deciphering functional building blocks of complex diseases.表达数据的通路分析:解读复杂疾病的功能构建模块。
PLoS Comput Biol. 2011 May;7(5):e1002053. doi: 10.1371/journal.pcbi.1002053. Epub 2011 May 26.
3
Multiple testing for gene sets from microarray experiments.基于基因芯片实验的基因集多重检验
骨骼肌中的昼夜节律功能障碍会损害外周动脉疾病中的肢体灌注和肌肉再生。
Arterioscler Thromb Vasc Biol. 2025 Feb;45(2):e30-e47. doi: 10.1161/ATVBAHA.124.321772. Epub 2024 Dec 5.
4
Aphthous stomatitis - computational biology suggests external biotic stimulus and immunogenic cell death involved.阿弗他口炎-计算生物学提示涉及外部生物刺激和免疫原性细胞死亡。
BMC Oral Health. 2024 Sep 29;24(1):1154. doi: 10.1186/s12903-024-04917-z.
5
GOAT: efficient and robust identification of gene set enrichment.GOAT:高效稳健的基因集富集识别。
Commun Biol. 2024 Jun 19;7(1):744. doi: 10.1038/s42003-024-06454-5.
6
Dynamic RNA polymerase II occupancy drives differentiation of the intestine under the direction of HNF4.动态 RNA 聚合酶 II 占据驱动着肠道在 HNF4 的指导下的分化。
Cell Rep. 2024 Jun 25;43(6):114242. doi: 10.1016/j.celrep.2024.114242. Epub 2024 May 19.
7
Assessment of Gene Set Enrichment Analysis using curated RNA-seq-based benchmarks.基于 RNA-seq 验证集的基因集富集分析评估。
PLoS One. 2024 May 16;19(5):e0302696. doi: 10.1371/journal.pone.0302696. eCollection 2024.
8
A Continuous Extension of Gene Set Enrichment Analysis Using the Likelihood Ratio Test Statistics Identifies Vascular Endothelial Growth Factor as a Candidate Pathway for Alzheimer's Disease via ITGA5.基于似然比检验统计量的基因集富集分析连续扩展,通过 ITGA5 将血管内皮生长因子鉴定为阿尔茨海默病的候选途径。
J Alzheimers Dis. 2024;97(2):635-648. doi: 10.3233/JAD-230934.
9
Dynamic RNA Polymerase II Recruitment Drives Differentiation of the Intestine under the direction of HNF4.在肝细胞核因子4(HNF4)的指导下,动态RNA聚合酶II募集驱动肠道分化。
bioRxiv. 2023 Nov 10:2023.11.08.566322. doi: 10.1101/2023.11.08.566322.
10
Addressing erroneous scale assumptions in microbe and gene set enrichment analysis.解决微生物和基因集富集分析中错误的尺度假设问题。
PLoS Comput Biol. 2023 Nov 20;19(11):e1011659. doi: 10.1371/journal.pcbi.1011659. eCollection 2023 Nov.
BMC Bioinformatics. 2011 May 26;12:209. doi: 10.1186/1471-2105-12-209.
4
Loss of the tumor suppressor Snf5 leads to aberrant activation of the Hedgehog-Gli pathway.肿瘤抑制因子 Snf5 的缺失会导致 Hedgehog-Gli 信号通路的异常激活。
Nat Med. 2010 Dec;16(12):1429-33. doi: 10.1038/nm.2251. Epub 2010 Nov 14.
5
Heading down the wrong pathway: on the influence of correlation within gene sets.误入歧途:基因集内相关性的影响。
BMC Genomics. 2010 Oct 18;11:574. doi: 10.1186/1471-2164-11-574.
6
De-correlating expression in gene-set analysis.基因集分析中的去相关表达。
Bioinformatics. 2010 Sep 15;26(18):i511-6. doi: 10.1093/bioinformatics/btq380.
7
ROAST: rotation gene set tests for complex microarray experiments.ROAST:用于复杂微阵列实验的旋转基因集检验。
Bioinformatics. 2010 Sep 1;26(17):2176-82. doi: 10.1093/bioinformatics/btq401. Epub 2010 Jul 7.
8
Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM.利用 PARADIGM 从多维癌症基因组学数据推断患者特异性途径活性。
Bioinformatics. 2010 Jun 15;26(12):i237-45. doi: 10.1093/bioinformatics/btq182.
9
MYC regulation of a "poor-prognosis" metastatic cancer cell state.MYC 调控“预后不良”转移性癌细胞状态。
Proc Natl Acad Sci U S A. 2010 Feb 23;107(8):3698-703. doi: 10.1073/pnas.0914203107. Epub 2010 Feb 4.
10
Gene set enrichment analysis made simple.基因集富集分析变得简单。
Stat Methods Med Res. 2009 Dec;18(6):565-75. doi: 10.1177/0962280209351908.