利用预测特异性确定基因集分析何时具有生物学意义。

Using predictive specificity to determine when gene set analysis is biologically meaningful.

机构信息

Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Woodbury, NY 11797, USA.

Department of Psychiatry and Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada.

出版信息

Nucleic Acids Res. 2017 Feb 28;45(4):e20. doi: 10.1093/nar/gkw957.

DOI:10.1093/nar/gkw957

PMID:28204549

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5389513/

Abstract

Gene set analysis, which translates gene lists into enriched functions, is among the most common bioinformatic methods. Yet few would advocate taking the results at face value. Not only is there no agreement on the algorithms themselves, there is no agreement on how to benchmark them. In this paper, we evaluate the robustness and uniqueness of enrichment results as a means of assessing methods even where correctness is unknown. We show that heavily annotated (‘multifunctional’) genes are likely to appear in genomics study results and drive the generation of biologically non-specific enrichment results as well as highly fragile significances. By providing a means of determining where enrichment analyses report non-specific and non-robust findings, we are able to assess where we can be confident in their use. We find significant progress in recent bias correction methods for enrichment and provide our own software implementation. Our approach can be readily adapted to any pre-existing package.

摘要

基因集分析（Gene set analysis）将基因列表转化为富集功能，是最常见的生物信息学方法之一。然而，很少有人会主张盲目接受结果。不仅算法本身没有达成共识，而且在如何对其进行基准测试方面也没有达成共识。在本文中，我们评估了富集结果的稳健性和独特性，即使在正确性未知的情况下，也可以作为评估方法的一种手段。我们表明，注释较多（“多功能”）的基因很可能出现在基因组学研究结果中，并导致产生生物学上非特异性的富集结果以及高度脆弱的显著性。通过提供一种确定富集分析报告非特异性和非稳健结果的方法，我们能够评估在何处可以自信地使用它们。我们发现，最近的富集偏差校正方法取得了显著进展，并提供了我们自己的软件实现。我们的方法可以很容易地适应任何现有的软件包。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8fb8/5389513/3baa6e7f4490/gkw957fig1.jpg

相似文献

Using predictive specificity to determine when gene set analysis is biologically meaningful.利用预测特异性确定基因集分析何时具有生物学意义。

Nucleic Acids Res. 2017 Feb 28;45(4):e20. doi: 10.1093/nar/gkw957.

Visual annotation display (VLAD): a tool for finding functional themes in lists of genes.视觉注释显示（VLAD）：一种在基因列表中寻找功能主题的工具。

Mamm Genome. 2015 Oct;26(9-10):567-73. doi: 10.1007/s00335-015-9570-2. Epub 2015 Jun 6.

snpGeneSets: An R Package for Genome-Wide Study Annotation.snp基因集：一个用于全基因组研究注释的R软件包。

G3 (Bethesda). 2016 Dec 7;6(12):4087-4095. doi: 10.1534/g3.116.034694.

BUSCO: Assessing Genome Assembly and Annotation Completeness.BUSCO：评估基因组组装和注释的完整性

Methods Mol Biol. 2019;1962:227-245. doi: 10.1007/978-1-4939-9173-0_14.

Gene set enrichment analysis.基因集富集分析

Methods Mol Biol. 2009;563:99-121. doi: 10.1007/978-1-60761-175-2_6.

Large-scale gene co-expression network as a source of functional annotation for cattle genes.大规模基因共表达网络作为牛基因功能注释的来源

BMC Genomics. 2016 Nov 2;17(1):846. doi: 10.1186/s12864-016-3176-2.

Equivalent change enrichment analysis: assessing equivalent and inverse change in biological pathways between diverse experiments.等效变化富集分析：评估不同实验中生物通路的等效和反向变化。

BMC Genomics. 2020 Feb 24;21(1):180. doi: 10.1186/s12864-020-6589-x.

Statistical approach for selection of biologically informative genes.用于选择具有生物学信息基因的统计方法。

Gene. 2018 May 20;655:71-83. doi: 10.1016/j.gene.2018.02.044. Epub 2018 Feb 16.

SeqTools: visual tools for manual analysis of sequence alignments.SeqTools：用于手动分析序列比对的可视化工具。

BMC Res Notes. 2016 Jan 22;9:39. doi: 10.1186/s13104-016-1847-3.

Extensive complementarity between gene function prediction methods.基因功能预测方法具有广泛的互补性。

Bioinformatics. 2016 Dec 1;32(23):3645-3653. doi: 10.1093/bioinformatics/btw532. Epub 2016 Aug 13.

引用本文的文献

Pool PaRTI: a PageRank-based pooling method for identifying critical residues and enhancing protein sequence representations.Pool PaRTI：一种基于PageRank的池化方法，用于识别关键残基并增强蛋白质序列表示。

Bioinformatics. 2025 Jun 2;41(6). doi: 10.1093/bioinformatics/btaf330.

Identifying reproducible transcription regulator coexpression patterns with single cell transcriptomics.利用单细胞转录组学识别可重复的转录调节因子共表达模式。

PLoS Comput Biol. 2025 Apr 21;21(4):e1012962. doi: 10.1371/journal.pcbi.1012962. eCollection 2025 Apr.

Pool PaRTI: A PageRank-Based Pooling Method for Identifying Critical Residues and Enhancing Protein Sequence Representations.Pool PaRTI：一种基于PageRank的池化方法，用于识别关键残基并增强蛋白质序列表示。

bioRxiv. 2025 Mar 17:2024.10.04.616701. doi: 10.1101/2024.10.04.616701.

A compendium of human gene functions derived from evolutionary modelling.基于进化建模得出的人类基因功能概要。

Nature. 2025 Apr;640(8057):146-154. doi: 10.1038/s41586-025-08592-0. Epub 2025 Feb 26.

To Tweak or Not to Tweak. How Exploiting Flexibilities in Gene Set Analysis Leads to Overoptimism.调整还是不调整。利用基因集分析中的灵活性如何导致过度乐观。

Biom J. 2025 Feb;67(1):e70016. doi: 10.1002/bimj.70016.

Identifying Reproducible Transcription Regulator Coexpression Patterns with Single Cell Transcriptomics.利用单细胞转录组学识别可重复的转录调节因子共表达模式。

bioRxiv. 2025 Feb 3:2024.02.15.580581. doi: 10.1101/2024.02.15.580581.

Gene Set Summarization Using Large Language Models.使用大语言模型进行基因集汇总

ArXiv. 2024 Jul 4:arXiv:2305.13338v3.

Culture-Associated DNA Methylation Changes Impact on Cellular Function of Human Intestinal Organoids.文化相关的 DNA 甲基化变化对人类肠道类器官的细胞功能的影响。

Cell Mol Gastroenterol Hepatol. 2022;14(6):1295-1310. doi: 10.1016/j.jcmgh.2022.08.008. Epub 2022 Aug 28.

On the influence of several factors on pathway enrichment analysis.几种因素对通路富集分析的影响。

Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac143.

The blood transcriptome prior to ovarian cancer diagnosis: A case-control study in the NOWAC postgenome cohort.卵巢癌诊断前的血液转录组：NOWAC 后基因组队列的病例对照研究。

PLoS One. 2021 Aug 27;16(8):e0256442. doi: 10.1371/journal.pone.0256442. eCollection 2021.

本文引用的文献

Positive and negative forms of replicability in gene network analysis.基因网络分析中的可重复性的正、负形式。

Bioinformatics. 2016 Apr 1;32(7):1065-73. doi: 10.1093/bioinformatics/btv734. Epub 2015 Dec 14.

Annotation enrichment analysis: an alternative method for evaluating the functional properties of gene sets.注释富集分析：一种评估基因集功能特性的替代方法。

Sci Rep. 2014 Feb 26;4:4191. doi: 10.1038/srep04191.

Large-scale gene function analysis with the PANTHER classification system.大规模基因功能分析与 PANTHER 分类系统。

Nat Protoc. 2013 Aug;8(8):1551-66. doi: 10.1038/nprot.2013.092. Epub 2013 Jul 18.

Neurocarta: aggregating and sharing disease-gene relations for the neurosciences.神经图谱：为神经科学整合和共享疾病-基因关系。

BMC Genomics. 2013 Feb 26;14:129. doi: 10.1186/1471-2164-14-129.

The UCSC Genome Browser database: extensions and updates 2013.UCSC 基因组浏览器数据库：扩展和更新 2013 年版

Nucleic Acids Res. 2013 Jan;41(Database issue):D64-9. doi: 10.1093/nar/gks1048. Epub 2012 Nov 15.

GO-Elite: a flexible solution for pathway and ontology over-representation.GO-Elite：一种用于通路和本体过度表达的灵活解决方案。

Bioinformatics. 2012 Aug 15;28(16):2209-10. doi: 10.1093/bioinformatics/bts366. Epub 2012 Jun 27.

Down-weighting overlapping genes improves gene set analysis.降低重叠基因的权重可以提高基因集分析的效果。

BMC Bioinformatics. 2012 Jun 19;13:136. doi: 10.1186/1471-2105-13-136.

"Guilt by association" is the exception rather than the rule in gene networks.“关联定罪”在基因网络中是例外而非常规。

PLoS Comput Biol. 2012;8(3):e1002444. doi: 10.1371/journal.pcbi.1002444. Epub 2012 Mar 29.

An environmental analysis of genes associated with schizophrenia: hypoxia and vascular factors as interacting elements in the neurodevelopmental model.与精神分裂症相关基因的环境分析：缺氧和血管因素作为神经发育模型中的相互作用因素。

Mol Psychiatry. 2012 Dec;17(12):1194-205. doi: 10.1038/mp.2011.183. Epub 2012 Jan 31.

The UniProt-GO Annotation database in 2011.2011 年的 UniProt-GO Annotation 数据库。

Nucleic Acids Res. 2012 Jan;40(Database issue):D565-70. doi: 10.1093/nar/gkr1048. Epub 2011 Nov 28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用预测特异性确定基因集分析何时具有生物学意义。

Using predictive specificity to determine when gene set analysis is biologically meaningful.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献