高通量基因组学中的富集分析——考虑空值中的依赖性。

Enrichment analysis in high-throughput genomics - accounting for dependency in the NULL.

作者信息

Gold David L, Coombes Kevin R, Wang Jing, Mallick Bani

机构信息

Department of Statistics, Texas A&M University, 3134 TAMU, College Statio, TX 77843-3143, USA.

出版信息

Brief Bioinform. 2007 Mar;8(2):71-7. doi: 10.1093/bib/bbl019. Epub 2006 Oct 31.

DOI:10.1093/bib/bbl019

PMID:17077137

Abstract

Translating the overwhelming amount of data generated in high-throughput genomics experiments into biologically meaningful evidence, which may for example point to a series of biomarkers or hint at a relevant pathway, is a matter of great interest in bioinformatics these days. Genes showing similar experimental profiles, it is hypothesized, share biological mechanisms that if understood could provide clues to the molecular processes leading to pathological events. It is the topic of further study to learn if or how a priori information about the known genes may serve to explain coexpression. One popular method of knowledge discovery in high-throughput genomics experiments, enrichment analysis (EA), seeks to infer if an interesting collection of genes is 'enriched' for a Consortium particular set of a priori Gene Ontology Consortium (GO) classes. For the purposes of statistical testing, the conventional methods offered in EA software implicitly assume independence between the GO classes. Genes may be annotated for more than one biological classification, and therefore the resulting test statistics of enrichment between GO classes can be highly dependent if the overlapping gene sets are relatively large. There is a need to formally determine if conventional EA results are robust to the independence assumption. We derive the exact null distribution for testing enrichment of GO classes by relaxing the independence assumption using well-known statistical theory. In applications with publicly available data sets, our test results are similar to the conventional approach which assumes independence. We argue that the independence assumption is not detrimental.

摘要

将高通量基因组学实验中产生的海量数据转化为具有生物学意义的证据，例如，这些证据可能指向一系列生物标志物或暗示相关途径，这是当今生物信息学中一个备受关注的问题。据推测，具有相似实验图谱的基因共享生物学机制，如果能够理解这些机制，就可以为导致病理事件的分子过程提供线索。了解已知基因的先验信息是否以及如何有助于解释共表达，是进一步研究的主题。在高通量基因组学实验中，一种流行的知识发现方法——富集分析（EA），试图推断一组有趣的基因是否针对特定的先验基因本体联合会（GO）类进行了“富集”。出于统计检验的目的，EA软件中提供的传统方法隐含地假设GO类之间相互独立。基因可能被注释为不止一种生物学分类，因此，如果重叠基因集相对较大，GO类之间富集的最终检验统计量可能高度相关。有必要正式确定传统的EA结果对独立性假设是否稳健。我们使用著名的统计理论，通过放宽独立性假设，推导出用于检验GO类富集的精确零分布。在使用公开可用数据集的应用中，我们的检验结果与假设独立的传统方法相似。我们认为独立性假设并无不利影响。

相似文献

Enrichment analysis in high-throughput genomics - accounting for dependency in the NULL.

Brief Bioinform. 2007 Mar;8(2):71-7. doi: 10.1093/bib/bbl019. Epub 2006 Oct 31.

SEGS: search for enriched gene sets in microarray data.

J Biomed Inform. 2008 Aug;41(4):588-601. doi: 10.1016/j.jbi.2007.12.001. Epub 2007 Dec 15.

springScape: visualisation of microarray and contextual bioinformatic data using spring embedding and an 'information landscape'.

Bioinformatics. 2006 Jul 15;22(14):e99-107. doi: 10.1093/bioinformatics/btl205.

Significance analysis of groups of genes in expression profiling studies.

Bioinformatics. 2007 Aug 15;23(16):2104-12. doi: 10.1093/bioinformatics/btm310. Epub 2007 Jun 6.

MMG: a probabilistic tool to identify submodules of metabolic pathways.

Bioinformatics. 2008 Apr 15;24(8):1078-84. doi: 10.1093/bioinformatics/btn066. Epub 2008 Feb 21.

Analysis of sample set enrichment scores: assaying the enrichment of sets of genes for individual samples in genome-wide expression profiles.

Bioinformatics. 2006 Jul 15;22(14):e108-16. doi: 10.1093/bioinformatics/btl231.

Functional genomics and proteomics in the clinical neurosciences: data mining and bioinformatics.

Prog Brain Res. 2006;158:83-108. doi: 10.1016/S0079-6123(06)58004-5.

Enrichment or depletion of a GO category within a class of genes: which test?

Bioinformatics. 2007 Feb 15;23(4):401-7. doi: 10.1093/bioinformatics/btl633. Epub 2006 Dec 20.

Extending pathways based on gene lists using InterPro domain signatures.

BMC Bioinformatics. 2008 Jan 4;9:3. doi: 10.1186/1471-2105-9-3.

How to decide which are the most pertinent overly-represented features during gene set enrichment analysis.

BMC Bioinformatics. 2007 Sep 11;8:332. doi: 10.1186/1471-2105-8-332.

引用本文的文献

A new molecular classification to drive precision treatment strategies in primary Sjögren's syndrome.

Nat Commun. 2021 Jun 10;12(1):3523. doi: 10.1038/s41467-021-23472-7.

Listeriomics: an Interactive Web Platform for Systems Biology of .

mSystems. 2017 Mar 14;2(2). doi: 10.1128/mSystems.00186-16. eCollection 2017 Mar-Apr.

A flexible bayesian model for testing for transmission ratio distortion.

Genetics. 2014 Dec;198(4):1357-67. doi: 10.1534/genetics.114.169607. Epub 2014 Sep 29.

Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets.

BMC Genomics. 2014;15 Suppl 1(Suppl 1):S6. doi: 10.1186/1471-2164-15-S1-S6. Epub 2014 Jan 24.

ROS production and NF-κB activation triggered by RAC1 facilitate WNT-driven intestinal stem cell proliferation and colorectal cancer initiation.

Cell Stem Cell. 2013 Jun 6;12(6):761-73. doi: 10.1016/j.stem.2013.04.006. Epub 2013 May 9.

A novel dynamic impact approach (DIA) for functional analysis of time-course omics studies: validation using the bovine mammary transcriptome.

PLoS One. 2012;7(3):e32455. doi: 10.1371/journal.pone.0032455. Epub 2012 Mar 16.

Comparison of lists of genes based on functional profiles.

BMC Bioinformatics. 2011 Oct 16;12:401. doi: 10.1186/1471-2105-12-401.

Robust and accurate data enrichment statistics via distribution function of sum of weights.

Bioinformatics. 2010 Nov 1;26(21):2752-9. doi: 10.1093/bioinformatics/btq511. Epub 2010 Sep 8.

Comparing gene annotation enrichment tools for functional modeling of agricultural microarray data.

BMC Bioinformatics. 2009 Oct 8;10 Suppl 11(Suppl 11):S9. doi: 10.1186/1471-2105-10-S11-S9.

Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists.

Nucleic Acids Res. 2009 Jan;37(1):1-13. doi: 10.1093/nar/gkn923. Epub 2008 Nov 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

高通量基因组学中的富集分析——考虑空值中的依赖性。

Enrichment analysis in high-throughput genomics - accounting for dependency in the NULL.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献