迫切需要在功能富集分析中使用一致的标准。

Urgent need for consistent standards in functional enrichment analysis.

机构信息

Deakin University, School of Life and Environmental Sciences, Geelong, Australia.

College of Health and Medical Technology, Middle Technical University, Baghdad, Iraq.

出版信息

PLoS Comput Biol. 2022 Mar 9;18(3):e1009935. doi: 10.1371/journal.pcbi.1009935. eCollection 2022 Mar.

DOI:10.1371/journal.pcbi.1009935

PMID:35263338

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8936487/

Abstract

Gene set enrichment tests (a.k.a. functional enrichment analysis) are among the most frequently used methods in computational biology. Despite this popularity, there are concerns that these methods are being applied incorrectly and the results of some peer-reviewed publications are unreliable. These problems include the use of inappropriate background gene lists, lack of false discovery rate correction and lack of methodological detail. To ascertain the frequency of these issues in the literature, we performed a screen of 186 open-access research articles describing functional enrichment results. We find that 95% of analyses using over-representation tests did not implement an appropriate background gene list or did not describe this in the methods. Failure to perform p-value correction for multiple tests was identified in 43% of analyses. Many studies lacked detail in the methods section about the tools and gene sets used. An extension of this survey showed that these problems are not associated with journal or article level bibliometrics. Using seven independent RNA-seq datasets, we show misuse of enrichment tools alters results substantially. In conclusion, most published functional enrichment studies suffered from one or more major flaws, highlighting the need for stronger standards for enrichment analysis.

摘要

基因集富集测试（也称为功能富集分析）是计算生物学中最常用的方法之一。尽管如此，人们担心这些方法的应用不正确，一些经过同行评审的出版物的结果不可靠。这些问题包括使用不适当的背景基因列表、缺乏错误发现率校正以及缺乏方法学细节。为了确定这些问题在文献中的频率，我们对 186 篇描述功能富集结果的开放获取研究文章进行了筛选。我们发现，95%使用过度表达测试的分析没有使用适当的背景基因列表，或者在方法中没有描述这一点。在 43%的分析中，没有对多个测试进行 p 值校正。许多研究在方法部分缺乏关于所使用的工具和基因集的详细信息。这项调查的扩展表明，这些问题与期刊或文章级别的文献计量学无关。使用七个独立的 RNA-seq 数据集，我们表明富集工具的误用会大大改变结果。总之，大多数已发表的功能富集研究都存在一个或多个主要缺陷，这突出表明需要对富集分析制定更强的标准。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b3b3/8936487/bbf513bd4ccc/pcbi.1009935.g001.jpg

相似文献

Urgent need for consistent standards in functional enrichment analysis.迫切需要在功能富集分析中使用一致的标准。

PLoS Comput Biol. 2022 Mar 9;18(3):e1009935. doi: 10.1371/journal.pcbi.1009935. eCollection 2022 Mar.

ALGAEFUN with MARACAS, microALGAE FUNctional enrichment tool for MicroAlgae RnA-seq and Chip-seq AnalysiS.玛拉卡斯藻趣，一种微藻 RNA-seq 和 Chip-seq 分析的微藻功能富集工具。

BMC Bioinformatics. 2022 Mar 31;23(1):113. doi: 10.1186/s12859-022-04639-5.

GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists.GOrilla：一种用于在排序后的基因列表中发现和可视化富集的基因本体（GO）术语的工具。

BMC Bioinformatics. 2009 Feb 3;10:48. doi: 10.1186/1471-2105-10-48.

Seten: a tool for systematic identification and comparison of processes, phenotypes, and diseases associated with RNA-binding proteins from condition-specific CLIP-seq profiles.Seten：一种用于从特定条件的CLIP-seq图谱中系统识别和比较与RNA结合蛋白相关的过程、表型和疾病的工具。

RNA. 2017 Jun;23(6):836-846. doi: 10.1261/rna.059089.116. Epub 2017 Mar 23.

Bias in microRNA functional enrichment analysis.微小RNA功能富集分析中的偏差

Bioinformatics. 2015 May 15;31(10):1592-8. doi: 10.1093/bioinformatics/btv023. Epub 2015 Jan 20.

NetGen: a novel network-based probabilistic generative model for gene set functional enrichment analysis.NetGen：一种用于基因集功能富集分析的基于网络的新型概率生成模型。

BMC Syst Biol. 2017 Sep 21;11(Suppl 4):75. doi: 10.1186/s12918-017-0456-7.

Comparing gene annotation enrichment tools for functional modeling of agricultural microarray data.比较基因注释富集工具在农业微阵列数据分析中的功能建模。

BMC Bioinformatics. 2009 Oct 8;10 Suppl 11(Suppl 11):S9. doi: 10.1186/1471-2105-10-S11-S9.

OneStopRNAseq: A Web Application for Comprehensive and Efficient Analyses of RNA-Seq Data.OneStopRNAseq：一种用于 RNA-Seq 数据综合高效分析的网络应用程序。

Genes (Basel). 2020 Oct 2;11(10):1165. doi: 10.3390/genes11101165.

Detecting discordance enrichment among a series of two-sample genome-wide expression data sets.检测一系列双样本全基因组表达数据集之间的不一致性富集情况。

BMC Genomics. 2017 Jan 25;18(Suppl 1):1050. doi: 10.1186/s12864-016-3265-2.

Comparative study of gene set enrichment methods.基因集富集方法的比较研究。

BMC Bioinformatics. 2009 Sep 2;10:275. doi: 10.1186/1471-2105-10-275.

引用本文的文献

A Critical Evaluation of Background Gene Omission in Imaging Transcriptomics.成像转录组学中背景基因遗漏的批判性评估

Biol Psychiatry Glob Open Sci. 2025 Jul 18;5(6):100568. doi: 10.1016/j.bpsgos.2025.100568. eCollection 2025 Nov.

Pathway Analysis Interpretation in the Multi-Omic Era.多组学时代的通路分析解读

BioTech (Basel). 2025 Jul 29;14(3):58. doi: 10.3390/biotech14030058.

SomaModules: a pathway enrichment approach tailored to SomaScan data.体细胞模块：一种针对SomaScan数据量身定制的通路富集方法。

bioRxiv. 2025 Aug 2:2025.07.30.667673. doi: 10.1101/2025.07.30.667673.

RNA-seq analysis of blood from cave- and surface-dwelling morphs reveal diverse transcriptomic responses to normoxic rearing.对洞穴型和地表型形态个体的血液进行RNA测序分析，揭示了对常氧饲养的多种转录组反应。

Front Physiol. 2025 Jul 17;16:1617136. doi: 10.3389/fphys.2025.1617136. eCollection 2025.

ScGOclust: leveraging gene ontology to find functionally analogous cell types between distant species.ScGOclust：利用基因本体来寻找远缘物种间功能相似的细胞类型。

Bioinformatics. 2025 Jul 1;41(Supplement_1):i571-i579. doi: 10.1093/bioinformatics/btaf195.

Mitochondrial fatty acid synthesis and MECR regulate CD4+ T cell function and oxidative metabolism.线粒体脂肪酸合成与MECR调节CD4+ T细胞功能及氧化代谢。

J Immunol. 2025 May 1;214(5):958-976. doi: 10.1093/jimmun/vkaf034.

Application of genomic tools to study and potentially improve the upper thermal tolerance of farmed Atlantic salmon (Salmo salar).应用基因组学工具研究并潜在地提高养殖大西洋鲑（Salmo salar）的热耐受上限。

BMC Genomics. 2025 Mar 24;26(1):294. doi: 10.1186/s12864-025-11482-4.

Unraveling cell-cell communication with NicheNet by inferring active ligands from transcriptomics data.通过从转录组学数据推断活性配体，利用NicheNet解析细胞间通讯。

Nat Protoc. 2025 Mar 4. doi: 10.1038/s41596-024-01121-9.

The mutational landscape and its longitudinal dynamics in relapsed and refractory classic Hodgkin lymphoma.复发/难治性经典型霍奇金淋巴瘤的突变图谱及其纵向动态变化

Ann Hematol. 2025 Mar;104(3):1721-1733. doi: 10.1007/s00277-025-06274-5. Epub 2025 Feb 24.

RNA Isoform Diversity in Human Neurodegenerative Diseases.人类神经退行性疾病中的RNA异构体多样性

eNeuro. 2024 Dec 27;11(12). doi: 10.1523/ENEURO.0296-24.2024. Print 2024 Dec.

本文引用的文献

Best practices on the differential expression analysis of multi-species RNA-seq.多物种 RNA-seq 差异表达分析的最佳实践。

Genome Biol. 2021 Apr 29;22(1):121. doi: 10.1186/s13059-021-02337-8.

BMC Bioinformatics. 2021 Apr 15;22(1):191. doi: 10.1186/s12859-021-04124-5.

Pathway size matters: the influence of pathway granularity on over-representation (enrichment analysis) statistics.通路大小很重要：通路粒度对过代表（富集分析）统计的影响。

BMC Genomics. 2021 Mar 16;22(1):191. doi: 10.1186/s12864-021-07502-8.

Valproic acid influences the expression of genes implicated with hyperglycaemia-induced complement and coagulation pathways.丙戊酸影响与高血糖诱导的补体和凝血途径相关的基因表达。

Sci Rep. 2021 Jan 25;11(1):2163. doi: 10.1038/s41598-021-81794-4.

Gene Set Analysis: Challenges, Opportunities, and Future Research.基因集分析：挑战、机遇与未来研究

Front Genet. 2020 Jun 30;11:654. doi: 10.3389/fgene.2020.00654. eCollection 2020.

OXPHOS bioenergetic compensation does not explain disease penetrance in Leber hereditary optic neuropathy.OXPHOS 生物能量补偿并不能解释 Leber 遗传性视神经病变的疾病外显率。

Mitochondrion. 2020 Sep;54:113-121. doi: 10.1016/j.mito.2020.07.003. Epub 2020 Jul 18.

mitch: multi-contrast pathway enrichment for multi-omics and single-cell profiling data.米奇：多组学和单细胞分析数据的多对照通路富集分析。

BMC Genomics. 2020 Jun 29;21(1):447. doi: 10.1186/s12864-020-06856-9.

Imbalanced Host Response to SARS-CoV-2 Drives Development of COVID-19.宿主对 SARS-CoV-2 的失衡反应导致 COVID-19 的发生。

Cell. 2020 May 28;181(5):1036-1045.e9. doi: 10.1016/j.cell.2020.04.026. Epub 2020 May 15.

The reactome pathway knowledgebase.Reactome 通路知识库。

Nucleic Acids Res. 2020 Jan 8;48(D1):D498-D503. doi: 10.1093/nar/gkz1031.

Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms.可重复科学的基因集富集：CERNO 与其他八种算法的比较。

Bioinformatics. 2019 Dec 15;35(24):5146-5154. doi: 10.1093/bioinformatics/btz447.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

迫切需要在功能富集分析中使用一致的标准。

Urgent need for consistent standards in functional enrichment analysis.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献