• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因集富集方法的比较研究。

Comparative study of gene set enrichment methods.

机构信息

Istituto di Studi sui Sistemi Intelligenti per l'Automazione, CNR, Bari, Italy.

出版信息

BMC Bioinformatics. 2009 Sep 2;10:275. doi: 10.1186/1471-2105-10-275.

DOI:10.1186/1471-2105-10-275
PMID:19725948
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2746222/
Abstract

BACKGROUND

The analysis of high-throughput gene expression data with respect to sets of genes rather than individual genes has many advantages. A variety of methods have been developed for assessing the enrichment of sets of genes with respect to differential expression. In this paper we provide a comparative study of four of these methods: Fisher's exact test, Gene Set Enrichment Analysis (GSEA), Random-Sets (RS), and Gene List Analysis with Prediction Accuracy (GLAPA). The first three methods use associative statistics, while the fourth uses predictive statistics. We first compare all four methods on simulated data sets to verify that Fisher's exact test is markedly worse than the other three approaches. We then validate the other three methods on seven real data sets with known genetic perturbations and then compare the methods on two cancer data sets where our a priori knowledge is limited.

RESULTS

The simulation study highlights that none of the three method outperforms all others consistently. GSEA and RS are able to detect weak signals of deregulation and they perform differently when genes in a gene set are both differentially up and down regulated. GLAPA is more conservative and large differences between the two phenotypes are required to allow the method to detect differential deregulation in gene sets. This is due to the fact that the enrichment statistic in GLAPA is prediction error which is a stronger criteria than classical two sample statistic as used in RS and GSEA. This was reflected in the analysis on real data sets as GSEA and RS were seen to be significant for particular gene sets while GLAPA was not, suggesting a small effect size. We find that the rank of gene set enrichment induced by GLAPA is more similar to RS than GSEA. More importantly, the rankings of the three methods share significant overlap.

CONCLUSION

The three methods considered in our study recover relevant gene sets known to be deregulated in the experimental conditions and pathologies analyzed. There are differences between the three methods and GSEA seems to be more consistent in finding enriched gene sets, although no method uniformly dominates over all data sets. Our analysis highlights the deep difference existing between associative and predictive methods for detecting enrichment and the use of both to better interpret results of pathway analysis. We close with suggestions for users of gene set methods.

摘要

背景

相对于单个基因而言,针对基因集进行高通量基因表达数据分析具有诸多优势。目前已经开发出多种方法来评估基因集在差异表达方面的富集情况。本文对其中的 4 种方法进行了比较研究:Fisher 精确检验、基因集富集分析(GSEA)、随机集(RS)和基于预测准确性的基因列表分析(GLAPA)。前 3 种方法使用关联统计学,而第 4 种方法使用预测统计学。我们首先在模拟数据集上比较了这 4 种方法,以验证 Fisher 精确检验明显不如其他 3 种方法。然后,我们在 7 个具有已知遗传扰动的真实数据集上验证了其他 3 种方法,之后在两个我们的先验知识有限的癌症数据集上比较了这些方法。

结果

模拟研究强调,没有一种方法始终优于其他方法。GSEA 和 RS 能够检测到较弱的失调信号,并且当基因集中的基因同时上调和下调时,它们的表现方式不同。GLAPA 更为保守,需要两个表型之间存在较大差异,才能允许该方法检测基因集中的差异失调。这是由于 GLAPA 中的富集统计量是预测误差,这比 RS 和 GSEA 中使用的经典两样本统计量更为严格。这在真实数据集的分析中得到了反映,因为对于特定的基因集,GSEA 和 RS 被认为是显著的,而 GLAPA 则不是,这表明效应量较小。我们发现,GLAPA 诱导的基因集富集排名与 RS 更为相似,而与 GSEA 则不相似。更重要的是,这 3 种方法的排名有很大的重叠。

结论

我们研究中考虑的 3 种方法都能够恢复在分析的实验条件和病理中已知失调的相关基因集。这 3 种方法之间存在差异,GSEA 似乎更一致地找到富集的基因集,尽管没有一种方法在所有数据集上都占主导地位。我们的分析强调了用于检测富集的关联方法和预测方法之间存在的深刻差异,并建议同时使用这两种方法以更好地解释通路分析的结果。最后,我们为基因集方法的使用者提出了建议。

相似文献

1
Comparative study of gene set enrichment methods.基因集富集方法的比较研究。
BMC Bioinformatics. 2009 Sep 2;10:275. doi: 10.1186/1471-2105-10-275.
2
Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets.多组大规模两样本表达数据集的一致整合基因集富集分析。
BMC Genomics. 2014;15 Suppl 1(Suppl 1):S6. doi: 10.1186/1471-2164-15-S1-S6. Epub 2014 Jan 24.
3
Gene expression analysis in clear cell renal cell carcinoma using gene set enrichment analysis for biostatistical management.基于基因集富集分析的 clear cell 肾细胞癌基因表达分析用于生物统计学管理。
BJU Int. 2011 Jul;108(2 Pt 2):E29-35. doi: 10.1111/j.1464-410X.2010.09794.x. Epub 2011 Mar 16.
4
Improving gene set analysis of microarray data by SAM-GS.通过SAM-GS改进微阵列数据的基因集分析
BMC Bioinformatics. 2007 Jul 5;8:242. doi: 10.1186/1471-2105-8-242.
5
Gene set enrichment analysis made simple.基因集富集分析变得简单。
Stat Methods Med Res. 2009 Dec;18(6):565-75. doi: 10.1177/0962280209351908.
6
Gene set analysis using sufficient dimension reduction.使用充分降维的基因集分析。
BMC Bioinformatics. 2016 Feb 6;17:74. doi: 10.1186/s12859-016-0928-6.
7
Improving Gene-Set Enrichment Analysis of RNA-Seq Data with Small Replicates.利用小样本重复改进RNA测序数据的基因集富集分析
PLoS One. 2016 Nov 9;11(11):e0165919. doi: 10.1371/journal.pone.0165919. eCollection 2016.
8
PAGE: parametric analysis of gene set enrichment.PAGE:基因集富集的参数分析
BMC Bioinformatics. 2005 Jun 8;6:144. doi: 10.1186/1471-2105-6-144.
9
SEGS: search for enriched gene sets in microarray data.SEGS:在微阵列数据中搜索富集的基因集。
J Biomed Inform. 2008 Aug;41(4):588-601. doi: 10.1016/j.jbi.2007.12.001. Epub 2007 Dec 15.
10
Assessment of Gene Set Enrichment Analysis using curated RNA-seq-based benchmarks.基于 RNA-seq 验证集的基因集富集分析评估。
PLoS One. 2024 May 16;19(5):e0302696. doi: 10.1371/journal.pone.0302696. eCollection 2024.

引用本文的文献

1
Simulated metabolic profiles reveal biases in pathway analysis methods.模拟代谢谱揭示了通路分析方法中的偏差。
Metabolomics. 2025 Sep 9;21(5):136. doi: 10.1007/s11306-025-02335-y.
2
Astrocyte-neuron combined targeting for CYP46A1 gene therapy in Huntington's disease.星形胶质细胞-神经元联合靶向用于亨廷顿舞蹈病的CYP46A1基因治疗
Acta Neuropathol Commun. 2025 Aug 26;13(1):184. doi: 10.1186/s40478-025-02054-4.
3
An Investigation of TDA1 Deficiency in Saccharomyces cerevisiae During Diauxic Growth.酿酒酵母在双相生长期间TDA1缺陷的研究

本文引用的文献

1
Modeling cancer progression via pathway dependencies.通过通路依赖性对癌症进展进行建模。
PLoS Comput Biol. 2008 Feb;4(2):e28. doi: 10.1371/journal.pcbi.0040028.
2
Module-based outcome prediction using breast cancer compendia.使用乳腺癌综合数据集进行基于模块的结果预测。
PLoS One. 2007 Oct 17;2(10):e1047. doi: 10.1371/journal.pone.0001047.
3
A multivariate extension of the gene set enrichment analysis.基因集富集分析的多元扩展。
Yeast. 2025 Jun;42(5-7):142-156. doi: 10.1002/yea.4004. Epub 2025 Jun 26.
4
Pharmacological Effects of Antioxidant Mycosporine-Glycine in Alleviating Ultraviolet B-Induced Skin Photodamage: Insights from Metabolomic and Transcriptomic Analyses.抗氧化剂肌孢素-甘氨酸减轻紫外线B诱导的皮肤光损伤的药理作用:代谢组学和转录组学分析的见解
Antioxidants (Basel). 2024 Dec 29;14(1):30. doi: 10.3390/antiox14010030.
5
Network Pharmacology of Ginseng (Part III): Antitumor Potential of a Fixed Combination of Red Ginseng and Red Sage as Determined by Transcriptomics.人参的网络药理学(第三部分):转录组学确定的红参和丹参固定组合的抗肿瘤潜力
Pharmaceuticals (Basel). 2022 Oct 30;15(11):1345. doi: 10.3390/ph15111345.
6
Genomic and transcriptomic profiling indicates the prognosis significance of mutational signature for TMB-high subtype in Chinese patients with gastric cancer.基因组和转录组谱分析表明,突变特征对于中国胃癌 TMB 高亚型患者的预后具有重要意义。
J Adv Res. 2023 Sep;51:121-134. doi: 10.1016/j.jare.2022.10.019. Epub 2022 Nov 10.
7
Synthesis of Silver Nano Particles Using Myricetin and the In-Vitro Assessment of Anti-Colorectal Cancer Activity: In-Silico Integration.使用杨梅素合成银纳米粒子及其体外评估抗结直肠癌活性的研究:计算机整合。
Int J Mol Sci. 2022 Sep 20;23(19):11024. doi: 10.3390/ijms231911024.
8
Network Pharmacology of Ginseng (Part II): The Differential Effects of Red Ginseng and Ginsenoside Rg5 in Cancer and Heart Diseases as Determined by Transcriptomics.人参的网络药理学(第二部分):转录组学确定红参和人参皂苷Rg5在癌症和心脏病中的差异作用
Pharmaceuticals (Basel). 2021 Sep 30;14(10):1010. doi: 10.3390/ph14101010.
9
Network Pharmacology of Red Ginseng (Part I): Effects of Ginsenoside Rg5 at Physiological and Sub-Physiological Concentrations.红参的网络药理学(第一部分):人参皂苷Rg5在生理和亚生理浓度下的作用
Pharmaceuticals (Basel). 2021 Sep 29;14(10):999. doi: 10.3390/ph14100999.
10
RNA sequencing reveals niche gene expression effects of beta-hydroxybutyrate in primary myotubes.RNA 测序揭示了β-羟丁酸对原代肌管中龛基因表达的影响。
Life Sci Alliance. 2021 Aug 18;4(10). doi: 10.26508/lsa.202101037. Print 2021 Oct.
J Bioinform Comput Biol. 2007 Oct;5(5):1139-53. doi: 10.1142/s0219720007003041.
4
Statistical assessment of functional categories of genes deregulated in pathological conditions by using microarray data.利用微阵列数据对病理状态下失调基因的功能类别进行统计学评估。
Bioinformatics. 2007 Aug 15;23(16):2063-72. doi: 10.1093/bioinformatics/btm289. Epub 2007 May 31.
5
A genomic strategy to refine prognosis in early-stage non-small-cell lung cancer.一种优化早期非小细胞肺癌预后的基因组策略。
N Engl J Med. 2006 Aug 10;355(6):570-80. doi: 10.1056/NEJMoa060467.
6
Gene expression profiling reveals the profound upregulation of hypoxia-responsive genes in primary human astrocytes.基因表达谱分析揭示了原代人星形胶质细胞中缺氧反应基因的显著上调。
Physiol Genomics. 2006 May 16;25(3):435-49. doi: 10.1152/physiolgenomics.00315.2005. Epub 2006 Feb 28.
7
Oncogenic pathway signatures in human cancers as a guide to targeted therapies.人类癌症中的致癌途径特征作为靶向治疗的指导
Nature. 2006 Jan 19;439(7074):353-7. doi: 10.1038/nature04296. Epub 2005 Nov 6.
8
Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.基因集富集分析:一种基于知识的方法用于解读全基因组表达谱。
Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50. doi: 10.1073/pnas.0506580102. Epub 2005 Sep 30.
9
Discovering statistically significant pathways in expression profiling studies.在基因表达谱研究中发现具有统计学意义的通路。
Proc Natl Acad Sci U S A. 2005 Sep 20;102(38):13544-9. doi: 10.1073/pnas.0506577102. Epub 2005 Sep 8.
10
Global detection of molecular changes reveals concurrent alteration of several biological pathways in nonsmall cell lung cancer cells.分子变化的全球检测揭示了非小细胞肺癌细胞中几种生物途径的同时改变。
Mol Genet Genomics. 2005 Sep;274(2):141-54. doi: 10.1007/s00438-005-0014-7. Epub 2005 Oct 11.