基因集富集分析中的排名指标：它们重要吗？

Ranking metrics in gene set enrichment analysis: do they matter?

作者信息

Zyla Joanna, Marczyk Michal, Weiner January, Polanska Joanna

机构信息

Data Mining Group, Institute of Automatic Control, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, Gliwice, 44-100, Poland.

Max Planck Institute for Infection Biology, Charitéplatz 1, Berlin, 10117, Germany.

出版信息

BMC Bioinformatics. 2017 May 12;18(1):256. doi: 10.1186/s12859-017-1674-0.

DOI:10.1186/s12859-017-1674-0

PMID:28499413

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5427619/

Abstract

BACKGROUND

There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results.

METHODS AND RESULTS

In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA .

CONCLUSIONS

Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner-Weiss-Schindler test statistic gives better outcomes. Also, it finds more enriched pathways than other tested metrics, which may induce new biological discoveries.

摘要

背景

存在多种方法可用于描述在不同实验条件下分子途径或基因本体中基因表达变化之间的复杂关系。其中，基因集富集分析似乎是最常用的方法之一（引用次数超过10000次）。一个可能影响最终结果的重要参数是用于基因排名的度量标准的选择。应用默认的排名度量标准可能会导致结果不佳。

方法与结果

在这项工作中，使用了28个基准数据集来评估16种不同排名度量标准（包括新提出的标准）的基因集分析的敏感性和假阳性率。此外，还测试了所选方法对样本量的稳健性。使用k均值聚类算法，建立了一组在总体敏感性、总体假阳性率和计算负荷方面表现最佳的四个度量标准，即适度韦尔奇检验统计量的绝对值、最小显著差异、信噪比的绝对值和鲍姆加特纳 - 魏斯 - 辛德勒检验统计量。在假阳性率估计方面，所有选定的排名度量标准对样本量都具有稳健性。在敏感性方面，适度韦尔奇检验统计量的绝对值和信噪比的绝对值给出了稳定的结果，但鲍姆加特纳 - 魏斯 - 辛德勒检验统计量和最小显著差异在样本量较大时表现更好。最后，将所有测试排名度量标准的基因集富集分析方法并行化并在MATLAB中实现，可在https://github.com/ZAEDPolSl/MrGSEA获取。

结论

在基因集富集分析中选择排名度量标准对途径富集分析的结果具有关键影响。适度韦尔奇检验的绝对值具有最佳的总体敏感性，最小显著差异具有基因集分析的最佳总体特异性。当非正态分布基因数量较多时，使用鲍姆加特纳 - 魏斯 - 辛德勒检验统计量会得到更好的结果。此外，它比其他测试度量标准发现更多富集的途径，这可能会引发新的生物学发现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b694/5427619/33c7e78fbdad/12859_2017_1674_Fig1_HTML.jpg

相似文献

Ranking metrics in gene set enrichment analysis: do they matter?基因集富集分析中的排名指标：它们重要吗？

BMC Bioinformatics. 2017 May 12;18(1):256. doi: 10.1186/s12859-017-1674-0.

The Baumgartner-Weiss-Schindler test for the detection of differentially expressed genes in replicated microarray experiments.用于在重复微阵列实验中检测差异表达基因的鲍姆加特纳-魏斯-辛德勒检验。

Bioinformatics. 2004 Dec 12;20(18):3553-64. doi: 10.1093/bioinformatics/bth442. Epub 2004 Jul 29.

Novel learning framework (knockoff technique) to evaluate metric ranking algorithms to describe human response to injury.用于评估度量排序算法以描述人类对损伤反应的新型学习框架（仿冒技术）。

Traffic Inj Prev. 2018;19(sup2):S121-S126. doi: 10.1080/15389588.2018.1519805. Epub 2018 Dec 20.

Sensitivity analysis of gene ranking methods in phenotype prediction.基因排序方法在表型预测中的敏感性分析。

J Biomed Inform. 2016 Dec;64:255-264. doi: 10.1016/j.jbi.2016.10.012. Epub 2016 Oct 26.

Metric for measuring the effectiveness of clustering of DNA microarray expression.用于测量 DNA 微阵列表达聚类有效性的度量。

BMC Bioinformatics. 2006 Sep 6;7 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-7-S2-S5.

Comparative study of gene set enrichment methods.基因集富集方法的比较研究。

BMC Bioinformatics. 2009 Sep 2;10:275. doi: 10.1186/1471-2105-10-275.

Improving Gene-Set Enrichment Analysis of RNA-Seq Data with Small Replicates.利用小样本重复改进RNA测序数据的基因集富集分析

PLoS One. 2016 Nov 9;11(11):e0165919. doi: 10.1371/journal.pone.0165919. eCollection 2016.

Detecting discordance enrichment among a series of two-sample genome-wide expression data sets.检测一系列双样本全基因组表达数据集之间的不一致性富集情况。

BMC Genomics. 2017 Jan 25;18(Suppl 1):1050. doi: 10.1186/s12864-016-3265-2.

BubbleGUM: automatic extraction of phenotype molecular signatures and comprehensive visualization of multiple Gene Set Enrichment Analyses.BubbleGUM：表型分子特征的自动提取及多种基因集富集分析的综合可视化

BMC Genomics. 2015 Oct 19;16:814. doi: 10.1186/s12864-015-2012-4.

GSEA-InContext: identifying novel and common patterns in expression experiments.GSEA-InContext：在表达实验中识别新颖和常见的模式。

Bioinformatics. 2018 Jul 1;34(13):i555-i564. doi: 10.1093/bioinformatics/bty271.

引用本文的文献

Replicability of bulk RNA-Seq differential expression and enrichment analysis results for small cohort sizes.小样本量时批量RNA测序差异表达及富集分析结果的可重复性

PLoS Comput Biol. 2025 May 5;21(5):e1011630. doi: 10.1371/journal.pcbi.1011630. eCollection 2025 May.

Proteomic Profile of Ischemic Heart Disease in Heart Failure: A Community Study.心力衰竭中缺血性心脏病的蛋白质组学特征：一项社区研究。

Mayo Clin Proc. 2025 Jul;100(7):1112-1126. doi: 10.1016/j.mayocp.2024.12.016. Epub 2025 Mar 31.

Variable Gene Copy Number in Cancer-Related Pathways Is Associated With Cancer Prevalence Across Mammals.癌症相关通路中可变基因拷贝数与哺乳动物的癌症患病率相关。

Mol Biol Evol. 2025 Mar 5;42(3). doi: 10.1093/molbev/msaf056.

Gene Set Enrichment Analysis in Zebrafish Embryos Is Susceptible to False-Positive Results in the Absence of Differentially Expressed Genes.在缺乏差异表达基因的情况下，斑马鱼胚胎中的基因集富集分析易出现假阳性结果。

Bioinform Biol Insights. 2025 Mar 4;19:11779322251321071. doi: 10.1177/11779322251321071. eCollection 2025.

Assessing the impact of transcriptomics data analysis pipelines on downstream functional enrichment results.评估转录组数据分析管道对下游功能富集结果的影响。

Nucleic Acids Res. 2024 Aug 12;52(14):8100-8111. doi: 10.1093/nar/gkae552.

Clinical and CSF single-cell profiling of post-COVID-19 cognitive impairment.新冠后认知障碍的临床和脑脊液单细胞分析。

Cell Rep Med. 2024 May 21;5(5):101561. doi: 10.1016/j.xcrm.2024.101561. Epub 2024 May 13.

Generalized reporter score-based enrichment analysis for omics data.基于广义报告者评分的组学数据富集分析。

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae116.

Chromatin activity identifies differential gene regulation across human ancestries.染色质活性鉴定了人类不同祖先之间的差异基因调控。

Genome Biol. 2024 Jan 15;25(1):21. doi: 10.1186/s13059-024-03165-2.

Delving into gene-set multiplex networks facilitated by a k-nearest neighbor-based measure of similarity.深入研究由基于k近邻相似度度量所推动的基因集多重网络。

Comput Struct Biotechnol J. 2023 Oct 11;21:4988-5002. doi: 10.1016/j.csbj.2023.09.042. eCollection 2023.

D-Allulose Ameliorates Dysregulated Macrophage Function and Mitochondrial NADH Homeostasis, Mitigating Obesity-Induced Insulin Resistance.D-阿洛酮糖可改善失调的巨噬细胞功能和线粒体 NADH 稳态，减轻肥胖引起的胰岛素抵抗。

Nutrients. 2023 Sep 29;15(19):4218. doi: 10.3390/nu15194218.

本文引用的文献

rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs.快速基因集富集分析（rapidGSEA）：在多核CPU和支持CUDA的GPU上加速基因集富集分析。

BMC Bioinformatics. 2016 Sep 23;17(1):394. doi: 10.1186/s12859-016-1244-x.

Bioconductor's EnrichmentBrowser: seamless navigation through combined results of set- & network-based enrichment analysis.生物导体的富集浏览器：通过基于集合和网络的富集分析的综合结果进行无缝导航。

BMC Bioinformatics. 2016 Jan 20;17:45. doi: 10.1186/s12859-016-0884-1.

KEGG as a reference resource for gene and protein annotation.KEGG作为基因和蛋白质注释的参考资源。

Nucleic Acids Res. 2016 Jan 4;44(D1):D457-62. doi: 10.1093/nar/gkv1070. Epub 2015 Oct 17.

Empirical comparison of structure-based pathway methods.基于结构的通路方法的实证比较。

Brief Bioinform. 2016 Mar;17(2):336-45. doi: 10.1093/bib/bbv049. Epub 2015 Jul 21.

Xmrk, kras and myc transgenic zebrafish liver cancer models share molecular signatures with subsets of human hepatocellular carcinoma.Xmrk、kras和myc转基因斑马鱼肝癌模型与人类肝细胞癌亚群具有共同的分子特征。

PLoS One. 2014 Mar 14;9(3):e91179. doi: 10.1371/journal.pone.0091179. eCollection 2014.

SeqGSEA: a Bioconductor package for gene set enrichment analysis of RNA-Seq data integrating differential expression and splicing.SeqGSEA：一个用于 RNA-Seq 数据基因集富集分析的 Bioconductor 软件包，集成了差异表达和剪接分析。

Bioinformatics. 2014 Jun 15;30(12):1777-9. doi: 10.1093/bioinformatics/btu090. Epub 2014 Feb 17.

Fold change rank ordering statistics: a new method for detecting differentially expressed genes.折叠变化等级排序统计：一种新的差异表达基因检测方法。

BMC Bioinformatics. 2014 Jan 15;15:14. doi: 10.1186/1471-2105-15-14.

A comparison of gene set analysis methods in terms of sensitivity, prioritization and specificity.基因集分析方法在灵敏度、优先级和特异性方面的比较。

PLoS One. 2013 Nov 15;8(11):e79217. doi: 10.1371/journal.pone.0079217. eCollection 2013.

Gene set analysis methods: statistical models and methodological differences.基因集分析方法：统计模型与方法差异

Brief Bioinform. 2014 Jul;15(4):504-18. doi: 10.1093/bib/bbt002.

Down-weighting overlapping genes improves gene set analysis.降低重叠基因的权重可以提高基因集分析的效果。

BMC Bioinformatics. 2012 Jun 19;13:136. doi: 10.1186/1471-2105-13-136.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基因集富集分析中的排名指标：它们重要吗？

Ranking metrics in gene set enrichment analysis: do they matter?

作者信息

机构信息

出版信息

BACKGROUND

METHODS AND RESULTS

CONCLUSIONS

背景

方法与结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献