• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通路大小很重要:通路粒度对过代表(富集分析)统计的影响。

Pathway size matters: the influence of pathway granularity on over-representation (enrichment analysis) statistics.

机构信息

Bioinformatics Research Group, SRI International, 333 Ravenswood Drive, Menlo Park, 94025, CA, USA.

Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, 140 Gortner Lab, 1479 Gortner Ave, Saint Paul, 55198, MN, USA.

出版信息

BMC Genomics. 2021 Mar 16;22(1):191. doi: 10.1186/s12864-021-07502-8.

DOI:10.1186/s12864-021-07502-8
PMID:33726670
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7967953/
Abstract

BACKGROUND

Enrichment or over-representation analysis is a common method used in bioinformatics studies of transcriptomics, metabolomics, and microbiome datasets. The key idea behind enrichment analysis is: given a set of significantly expressed genes (or metabolites), use that set to infer a smaller set of perturbed biological pathways or processes, in which those genes (or metabolites) play a role. Enrichment computations rely on collections of defined biological pathways and/or processes, which are usually drawn from pathway databases. Although practitioners of enrichment analysis take great care to employ statistical corrections (e.g., for multiple testing), they appear unaware that enrichment results are quite sensitive to the pathway definitions that the calculation uses.

RESULTS

We show that alternative pathway definitions can alter enrichment p-values by up to nine orders of magnitude, whereas statistical corrections typically alter enrichment p-values by only two orders of magnitude. We present multiple examples where the smaller pathway definitions used in the EcoCyc database produces stronger enrichment p-values than the much larger pathway definitions used in the KEGG database; we demonstrate that to attain a given enrichment p-value, KEGG-based enrichment analyses require 1.3-2.0 times as many significantly expressed genes as does EcoCyc-based enrichment analyses. The large pathways in KEGG are problematic for another reason: they blur together multiple (as many as 21) biological processes. When such a KEGG pathway receives a high enrichment p-value, which of its component processes is perturbed is unclear, and thus the biological conclusions drawn from enrichment of large pathways are also in question.

CONCLUSIONS

The choice of pathway database used in enrichment analyses can have a much stronger effect on the enrichment results than the statistical corrections used in these analyses.

摘要

背景

富集或过表达分析是转录组学、代谢组学和微生物组学数据的生物信息学研究中常用的方法。富集分析的关键思想是:给定一组显著表达的基因(或代谢物),利用该组推断出较小的一组受干扰的生物途径或过程,其中这些基因(或代谢物)发挥作用。富集计算依赖于定义明确的生物途径和/或过程的集合,这些集合通常来自途径数据库。尽管富集分析的实践者非常注意采用统计校正(例如,用于多重检验),但他们似乎没有意识到富集结果对计算中使用的途径定义非常敏感。

结果

我们表明,替代途径定义可以将富集 p 值改变多达九个数量级,而统计校正通常仅将富集 p 值改变两个数量级。我们提出了多个示例,其中 EcoCyc 数据库中使用的较小途径定义产生的富集 p 值比 KEGG 数据库中使用的大得多的途径定义要强;我们证明,为了达到给定的富集 p 值,基于 KEGG 的富集分析需要比基于 EcoCyc 的富集分析多 1.3-2.0 倍的显著表达基因。KEGG 中的大途径还有另一个问题:它们将多个(多达 21 个)生物过程混合在一起。当这样的 KEGG 途径获得高富集 p 值时,不清楚其组成过程中的哪一个受到干扰,因此,从大途径的富集中得出的生物学结论也存在疑问。

结论

在富集分析中使用的途径数据库的选择对富集结果的影响可能比这些分析中使用的统计校正大得多。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a19/7967953/0e52bcabe4ca/12864_2021_7502_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a19/7967953/0e52bcabe4ca/12864_2021_7502_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a19/7967953/0e52bcabe4ca/12864_2021_7502_Fig1_HTML.jpg

相似文献

1
Pathway size matters: the influence of pathway granularity on over-representation (enrichment analysis) statistics.通路大小很重要:通路粒度对过代表(富集分析)统计的影响。
BMC Genomics. 2021 Mar 16;22(1):191. doi: 10.1186/s12864-021-07502-8.
2
Evaluation and comparison of bioinformatic tools for the enrichment analysis of metabolomics data.代谢组学数据富集分析的生物信息学工具评估与比较。
BMC Bioinformatics. 2018 Jan 2;19(1):1. doi: 10.1186/s12859-017-2006-0.
3
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
4
Metabolomic network analysis of estrogen-stimulated MCF-7 cells: a comparison of overrepresentation analysis, quantitative enrichment analysis and pathway analysis versus metabolite network analysis.雌激素刺激的MCF-7细胞的代谢组学网络分析:过表达分析、定量富集分析和通路分析与代谢物网络分析的比较
Arch Toxicol. 2017 Jan;91(1):217-230. doi: 10.1007/s00204-016-1695-x. Epub 2016 Apr 2.
5
The outcomes of pathway database computations depend on pathway ontology.通路数据库计算的结果取决于通路本体论。
Nucleic Acids Res. 2006 Aug 7;34(13):3687-97. doi: 10.1093/nar/gkl438. Print 2006.
6
IPAD: the Integrated Pathway Analysis Database for Systematic Enrichment Analysis.IPAD:系统富集分析的综合途径分析数据库。
BMC Bioinformatics. 2012;13 Suppl 15(Suppl 15):S7. doi: 10.1186/1471-2105-13-S15-S7. Epub 2012 Sep 11.
7
Pathway analysis in metabolomics: Recommendations for the use of over-representation analysis.代谢组学中的通路分析:过表达分析的使用建议。
PLoS Comput Biol. 2021 Sep 7;17(9):e1009105. doi: 10.1371/journal.pcbi.1009105. eCollection 2021 Sep.
8
RaMP: A Comprehensive Relational Database of Metabolomics Pathways for Pathway Enrichment Analysis of Genes and Metabolites.RaMP:用于基因和代谢物通路富集分析的代谢组学通路综合关系数据库。
Metabolites. 2018 Feb 22;8(1):16. doi: 10.3390/metabo8010016.
9
Redundancy control in pathway databases (ReCiPa): an application for improving gene-set enrichment analysis in Omics studies and "Big data" biology.通路数据库中的冗余控制 (ReCiPa):一种改进组学研究和“大数据”生物学中基因集富集分析的应用。
OMICS. 2013 Aug;17(8):414-22. doi: 10.1089/omi.2012.0083. Epub 2013 Jun 11.
10
FELLA: an R package to enrich metabolomics data.FELLA:一个用于丰富代谢组学数据的 R 包。
BMC Bioinformatics. 2018 Dec 22;19(1):538. doi: 10.1186/s12859-018-2487-5.

引用本文的文献

1
A blood-based DNA damage signature in patients with Parkinson's disease is associated with disease progression.帕金森病患者基于血液的DNA损伤特征与疾病进展相关。
Nat Aging. 2025 Sep 5. doi: 10.1038/s43587-025-00926-x.
2
Pathway Analysis Interpretation in the Multi-Omic Era.多组学时代的通路分析解读
BioTech (Basel). 2025 Jul 29;14(3):58. doi: 10.3390/biotech14030058.
3
Spectral divergence prioritizes key classes, genes, and pathways shared between substance use disorders and cardiovascular disease.光谱散度对物质使用障碍和心血管疾病之间共有的关键类别、基因和通路进行了优先排序。

本文引用的文献

1
The Impact of Pathway Database Choice on Statistical Enrichment Analysis and Predictive Modeling.通路数据库选择对统计富集分析和预测建模的影响。
Front Genet. 2019 Nov 22;10:1203. doi: 10.3389/fgene.2019.01203. eCollection 2019.
2
The MetaCyc database of metabolic pathways and enzymes - a 2019 update.代谢途径和酶的 MetaCyc 数据库——2019 年更新。
Nucleic Acids Res. 2020 Jan 8;48(D1):D445-D453. doi: 10.1093/nar/gkz862.
3
The Gene Ontology Resource: 20 years and still GOing strong.《基因本体论资源:20 年,持续强大》
Front Neurosci. 2025 Jul 22;19:1572243. doi: 10.3389/fnins.2025.1572243. eCollection 2025.
4
Cluefish: mining the dark matter of transcriptional data series with over-representation analysis enhanced by aggregated biological prior knowledge.线索鱼:利用聚合生物学先验知识增强的过度表达分析挖掘转录数据系列的暗物质。
NAR Genom Bioinform. 2025 Jul 30;7(3):lqaf103. doi: 10.1093/nargab/lqaf103. eCollection 2025 Sep.
5
Comparison of popular enrichment methods for untargeted in vitro metabolomics.非靶向体外代谢组学常用富集方法的比较
Metabolomics. 2025 Jul 27;21(4):103. doi: 10.1007/s11306-025-02309-0.
6
Low TAS1R2 Sweet Taste Receptor Expression in Skeletal Muscle of Genetically Diverse BXD Mice Mirrors Transcriptomic Signatures of Loss-of-Function Mice.遗传多样性BXD小鼠骨骼肌中低水平的TAS1R2甜味受体表达反映了功能丧失型小鼠的转录组特征。
Nutrients. 2025 Jun 3;17(11):1918. doi: 10.3390/nu17111918.
7
A workflow for human health hazard evaluation using transcriptomic data and Key Characteristics-based gene sets.一种使用转录组数据和基于关键特征的基因集进行人类健康危害评估的工作流程。
Toxicol Sci. 2025 Jun 1;205(2):310-325. doi: 10.1093/toxsci/kfaf036.
8
Quantifying liver-toxic responses from dose-dependent chemical exposures using a rat genome-scale metabolic model.使用大鼠全基因组代谢模型量化剂量依赖性化学暴露引起的肝毒性反应。
Toxicol Sci. 2025 Apr 1;204(2):154-168. doi: 10.1093/toxsci/kfaf005.
9
High-throughput gene expression analysis with TempO-LINC sensitively resolves complex brain, lung and kidney heterogeneity at single-cell resolution.使用TempO-LINC进行的高通量基因表达分析能够在单细胞分辨率下灵敏地解析复杂的脑、肺和肾组织异质性。
Sci Rep. 2024 Dec 28;14(1):31285. doi: 10.1038/s41598-024-82736-6.
10
Two subtle problems with overrepresentation analysis.过度代表性分析存在的两个细微问题。
Bioinform Adv. 2024 Oct 21;4(1):vbae159. doi: 10.1093/bioadv/vbae159. eCollection 2024.
Nucleic Acids Res. 2019 Jan 8;47(D1):D330-D338. doi: 10.1093/nar/gky1055.
4
The BioCyc collection of microbial genomes and metabolic pathways.生物信息学循环(BioCyc)微生物基因组和代谢途径集合。
Brief Bioinform. 2019 Jul 19;20(4):1085-1093. doi: 10.1093/bib/bbx085.
5
Evaluation and comparison of bioinformatic tools for the enrichment analysis of metabolomics data.代谢组学数据富集分析的生物信息学工具评估与比较。
BMC Bioinformatics. 2018 Jan 2;19(1):1. doi: 10.1186/s12859-017-2006-0.
6
The Reactome Pathway Knowledgebase.Reactome 通路知识库。
Nucleic Acids Res. 2018 Jan 4;46(D1):D649-D655. doi: 10.1093/nar/gkx1132.
7
WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research.WikiPathways:一个将代谢组学与其他组学研究联系起来的多方面的途径数据库。
Nucleic Acids Res. 2018 Jan 4;46(D1):D661-D667. doi: 10.1093/nar/gkx1064.
8
KEGG: new perspectives on genomes, pathways, diseases and drugs.京都基因与基因组百科全书(KEGG):关于基因组、通路、疾病和药物的新视角。
Nucleic Acids Res. 2017 Jan 4;45(D1):D353-D361. doi: 10.1093/nar/gkw1092. Epub 2016 Nov 28.
9
The EcoCyc database: reflecting new knowledge about Escherichia coli K-12.EcoCyc数据库:反映有关大肠杆菌K-12的新知识。
Nucleic Acids Res. 2017 Jan 4;45(D1):D543-D550. doi: 10.1093/nar/gkw1003. Epub 2016 Nov 28.
10
The Cancer Genome Atlas Pan-Cancer analysis project.癌症基因组图谱泛癌分析项目。
Nat Genet. 2013 Oct;45(10):1113-20. doi: 10.1038/ng.2764.