• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

确保竞争基因集方法的统计稳健性:基因筛选和全基因组覆盖是必不可少的。

Ensuring the statistical soundness of competitive gene set approaches: gene filtering and genome-scale coverage are essential.

机构信息

Computational Biology and Machine Learning Laboratory, Center for Cancer Research and Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Belfast, 97 Lisburn Road, Belfast BT9 7BL, UK.

出版信息

Nucleic Acids Res. 2013 Apr;41(7):e82. doi: 10.1093/nar/gkt054. Epub 2013 Feb 6.

DOI:10.1093/nar/gkt054
PMID:23389952
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3627569/
Abstract

In this article, we focus on the analysis of competitive gene set methods for detecting the statistical significance of pathways from gene expression data. Our main result is to demonstrate that some of the most frequently used gene set methods, GSEA, GSEArot and GAGE, are severely influenced by the filtering of the data in a way that such an analysis is no longer reconcilable with the principles of statistical inference, rendering the obtained results in the worst case inexpressive. A possible consequence of this is that these methods can increase their power by the addition of unrelated data and noise. Our results are obtained within a bootstrapping framework that allows a rigorous assessment of the robustness of results and enables power estimates. Our results indicate that when using competitive gene set methods, it is imperative to apply a stringent gene filtering criterion. However, even when genes are filtered appropriately, for gene expression data from chips that do not provide a genome-scale coverage of the expression values of all mRNAs, this is not enough for GSEA, GSEArot and GAGE to ensure the statistical soundness of the applied procedure. For this reason, for biomedical and clinical studies, we strongly advice not to use GSEA, GSEArot and GAGE for such data sets.

摘要

在本文中,我们专注于分析竞争基因集方法,以检测基因表达数据中途径的统计学意义。我们的主要结果是证明,一些最常用的基因集方法,GSEA、GSEArot 和 GAGE,受到数据过滤的严重影响,以至于这种分析不再与统计推断的原则一致,使得在最坏的情况下得到的结果没有表达力。这种情况的一个可能后果是,这些方法可以通过添加不相关的数据和噪声来增加其功效。我们的结果是在一个自举框架内获得的,该框架允许对结果的稳健性进行严格评估,并能够进行功效估计。我们的结果表明,在使用竞争基因集方法时,必须应用严格的基因过滤标准。然而,即使对芯片的基因表达数据进行适当的基因过滤,对于不能提供所有 mRNA 表达值的全基因组覆盖的基因表达数据,对于 GSEA、GSEArot 和 GAGE 来说,这不足以确保所应用程序的统计合理性。因此,对于生物医学和临床研究,我们强烈建议不要在这种数据集上使用 GSEA、GSEArot 和 GAGE。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5748/3627569/da039124303f/gkt054f6p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5748/3627569/5e72a5f15814/gkt054f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5748/3627569/eb66b6318cf6/gkt054f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5748/3627569/ff3fcd9a59bb/gkt054f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5748/3627569/80934a2b12b3/gkt054f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5748/3627569/4d9e68a7aa66/gkt054f5p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5748/3627569/da039124303f/gkt054f6p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5748/3627569/5e72a5f15814/gkt054f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5748/3627569/eb66b6318cf6/gkt054f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5748/3627569/ff3fcd9a59bb/gkt054f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5748/3627569/80934a2b12b3/gkt054f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5748/3627569/4d9e68a7aa66/gkt054f5p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5748/3627569/da039124303f/gkt054f6p.jpg

相似文献

1
Ensuring the statistical soundness of competitive gene set approaches: gene filtering and genome-scale coverage are essential.确保竞争基因集方法的统计稳健性:基因筛选和全基因组覆盖是必不可少的。
Nucleic Acids Res. 2013 Apr;41(7):e82. doi: 10.1093/nar/gkt054. Epub 2013 Feb 6.
2
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.基因集富集分析:一种基于知识的方法用于解读全基因组表达谱。
Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50. doi: 10.1073/pnas.0506580102. Epub 2005 Sep 30.
3
Comparative evaluation of gene-set analysis methods.基因集分析方法的比较评估
BMC Bioinformatics. 2007 Nov 7;8:431. doi: 10.1186/1471-2105-8-431.
4
MOST: detecting cancer differential gene expression.MOST:检测癌症差异基因表达。
Biostatistics. 2008 Jul;9(3):411-8. doi: 10.1093/biostatistics/kxm042. Epub 2007 Nov 29.
5
GAGE: generally applicable gene set enrichment for pathway analysis.GAGE:用于通路分析的通用基因集富集分析
BMC Bioinformatics. 2009 May 27;10:161. doi: 10.1186/1471-2105-10-161.
6
Assessment of gene set analysis methods based on microarray data.基于微阵列数据的基因集分析方法评估。
Gene. 2014 Jan 25;534(2):383-9. doi: 10.1016/j.gene.2013.08.063. Epub 2013 Sep 3.
7
Accurate molecular classification of cancer using simple rules.使用简单规则进行准确的癌症分子分类。
BMC Med Genomics. 2009 Oct 30;2:64. doi: 10.1186/1755-8794-2-64.
8
Signal transduction pathway profiling of individual tumor samples.单个肿瘤样本的信号转导通路分析
BMC Bioinformatics. 2005 Jun 29;6:163. doi: 10.1186/1471-2105-6-163.
9
Determination of the differentially expressed genes in microarray experiments using local FDR.使用局部错误发现率确定微阵列实验中的差异表达基因。
BMC Bioinformatics. 2004 Sep 6;5:125. doi: 10.1186/1471-2105-5-125.
10
Convergent Random Forest predictor: methodology for predicting drug response from genome-scale data applied to anti-TNF response.汇聚随机森林预测器:从基因组规模数据预测药物反应的方法,应用于抗 TNF 反应。
Genomics. 2009 Dec;94(6):423-32. doi: 10.1016/j.ygeno.2009.08.008. Epub 2009 Aug 20.

引用本文的文献

1
Grand Challenges for Artificial Intelligence in Molecular Medicine.分子医学中人工智能面临的重大挑战。
Front Mol Med. 2021 Jul 22;1:734659. doi: 10.3389/fmmed.2021.734659. eCollection 2021.
2
On the influence of several factors on pathway enrichment analysis.几种因素对通路富集分析的影响。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac143.
3
Positive correlation between transcriptomic stemness and PI3K/AKT/mTOR signaling scores in breast cancer, and a counterintuitive relationship with PIK3CA genotype.

本文引用的文献

1
Networks for systems biology: conceptual connection of data and function.系统生物学网络:数据与功能的概念连接。
IET Syst Biol. 2011 May;5(3):185-207. doi: 10.1049/iet-syb.2010.0025.
2
Pathway analysis of expression data: deciphering functional building blocks of complex diseases.表达数据的通路分析:解读复杂疾病的功能构建模块。
PLoS Comput Biol. 2011 May;7(5):e1002053. doi: 10.1371/journal.pcbi.1002053. Epub 2011 May 26.
3
Comparison of global tests for functional gene sets in two-group designs and selection of potentially effect-causing genes.
乳腺癌中转录组干性与 PI3K/AKT/mTOR 信号评分之间存在正相关,与 PIK3CA 基因型呈反直觉关系。
PLoS Genet. 2021 Nov 11;17(11):e1009876. doi: 10.1371/journal.pgen.1009876. eCollection 2021 Nov.
4
Gene Set Analysis: Challenges, Opportunities, and Future Research.基因集分析:挑战、机遇与未来研究
Front Genet. 2020 Jun 30;11:654. doi: 10.3389/fgene.2020.00654. eCollection 2020.
5
Ensuring Quality Standards and Reproducible Research for Data Analysis Services in Oncology: A Cooperative Service Model.确保肿瘤学数据分析服务的质量标准和可重复研究:一种合作服务模式。
Front Cell Dev Biol. 2019 Dec 17;7:349. doi: 10.3389/fcell.2019.00349. eCollection 2019.
6
Proteome-transcriptome alignment of molecular portraits achieved by self-contained gene set analysis: Consensus colon cancer subtypes case study.通过独立的基因集分析实现分子肖像的蛋白质组 - 转录组比对:一致性结肠癌亚型案例研究。
PLoS One. 2019 Aug 22;14(8):e0221444. doi: 10.1371/journal.pone.0221444. eCollection 2019.
7
Simultaneous Enrichment Analysis of all Possible Gene-sets: Unifying Self-Contained and Competitive Methods.同时富集分析所有可能的基因集:统一自包含和竞争方法。
Brief Bioinform. 2020 Jul 15;21(4):1302-1312. doi: 10.1093/bib/bbz074.
8
Data-driven human transcriptomic modules determined by independent component analysis.基于独立成分分析的人类转录组模块的数据分析。
BMC Bioinformatics. 2018 Sep 17;19(1):327. doi: 10.1186/s12859-018-2338-4.
9
Pathway analysis identifies altered mitochondrial metabolism, neurotransmission, structural pathways and complement cascade in retina/RPE/ choroid in chick model of form-deprivation myopia.通路分析确定了形觉剥夺性近视雏鸡模型视网膜/视网膜色素上皮/脉络膜中改变的线粒体代谢、神经传递、结构通路和补体级联反应。
PeerJ. 2018 Jun 27;6:e5048. doi: 10.7717/peerj.5048. eCollection 2018.
10
Differential expression of genes and differentially perturbed pathways associated with very high evening fatigue in oncology patients receiving chemotherapy.与接受化疗的肿瘤患者非常高的傍晚疲劳相关的基因差异表达和差异扰动途径。
Support Care Cancer. 2018 Mar;26(3):739-750. doi: 10.1007/s00520-017-3883-5. Epub 2017 Sep 25.
两组设计中功能基因集的全局检验比较及潜在效应基因的选择。
Bioinformatics. 2011 May 15;27(10):1377-83. doi: 10.1093/bioinformatics/btr152. Epub 2011 Mar 26.
4
De-correlating expression in gene-set analysis.基因集分析中的去相关表达。
Bioinformatics. 2010 Sep 15;26(18):i511-6. doi: 10.1093/bioinformatics/btq380.
5
Independent filtering increases detection power for high-throughput experiments.独立过滤提高了高通量实验的检测能力。
Proc Natl Acad Sci U S A. 2010 May 25;107(21):9546-51. doi: 10.1073/pnas.0914005107. Epub 2010 May 11.
6
Gene set enrichment analysis made simple.基因集富集分析变得简单。
Stat Methods Med Res. 2009 Dec;18(6):565-75. doi: 10.1177/0962280209351908.
7
A unifying view of 21st century systems biology.二十一世纪系统生物学的统一观点。
FEBS Lett. 2009 Dec 17;583(24):3891-4. doi: 10.1016/j.febslet.2009.11.024.
8
RNA-seq: from technology to biology.RNA-seq:从技术到生物学。
Cell Mol Life Sci. 2010 Feb;67(4):569-79. doi: 10.1007/s00018-009-0180-6. Epub 2009 Oct 27.
9
Comparative study of gene set enrichment methods.基因集富集方法的比较研究。
BMC Bioinformatics. 2009 Sep 2;10:275. doi: 10.1186/1471-2105-10-275.
10
Rotation testing in gene set enrichment analysis for small direct comparison experiments.小型直接比较实验的基因集富集分析中的旋转测试。
Stat Appl Genet Mol Biol. 2009;8:Article34. doi: 10.2202/1544-6115.1418. Epub 2009 Jul 27.