Suppr超能文献

高通量基因组学中的富集分析——考虑空值中的依赖性。

Enrichment analysis in high-throughput genomics - accounting for dependency in the NULL.

作者信息

Gold David L, Coombes Kevin R, Wang Jing, Mallick Bani

机构信息

Department of Statistics, Texas A&M University, 3134 TAMU, College Statio, TX 77843-3143, USA.

出版信息

Brief Bioinform. 2007 Mar;8(2):71-7. doi: 10.1093/bib/bbl019. Epub 2006 Oct 31.

Abstract

Translating the overwhelming amount of data generated in high-throughput genomics experiments into biologically meaningful evidence, which may for example point to a series of biomarkers or hint at a relevant pathway, is a matter of great interest in bioinformatics these days. Genes showing similar experimental profiles, it is hypothesized, share biological mechanisms that if understood could provide clues to the molecular processes leading to pathological events. It is the topic of further study to learn if or how a priori information about the known genes may serve to explain coexpression. One popular method of knowledge discovery in high-throughput genomics experiments, enrichment analysis (EA), seeks to infer if an interesting collection of genes is 'enriched' for a Consortium particular set of a priori Gene Ontology Consortium (GO) classes. For the purposes of statistical testing, the conventional methods offered in EA software implicitly assume independence between the GO classes. Genes may be annotated for more than one biological classification, and therefore the resulting test statistics of enrichment between GO classes can be highly dependent if the overlapping gene sets are relatively large. There is a need to formally determine if conventional EA results are robust to the independence assumption. We derive the exact null distribution for testing enrichment of GO classes by relaxing the independence assumption using well-known statistical theory. In applications with publicly available data sets, our test results are similar to the conventional approach which assumes independence. We argue that the independence assumption is not detrimental.

摘要

将高通量基因组学实验中产生的海量数据转化为具有生物学意义的证据,例如,这些证据可能指向一系列生物标志物或暗示相关途径,这是当今生物信息学中一个备受关注的问题。据推测,具有相似实验图谱的基因共享生物学机制,如果能够理解这些机制,就可以为导致病理事件的分子过程提供线索。了解已知基因的先验信息是否以及如何有助于解释共表达,是进一步研究的主题。在高通量基因组学实验中,一种流行的知识发现方法——富集分析(EA),试图推断一组有趣的基因是否针对特定的先验基因本体联合会(GO)类进行了“富集”。出于统计检验的目的,EA软件中提供的传统方法隐含地假设GO类之间相互独立。基因可能被注释为不止一种生物学分类,因此,如果重叠基因集相对较大,GO类之间富集的最终检验统计量可能高度相关。有必要正式确定传统的EA结果对独立性假设是否稳健。我们使用著名的统计理论,通过放宽独立性假设,推导出用于检验GO类富集的精确零分布。在使用公开可用数据集的应用中,我们的检验结果与假设独立的传统方法相似。我们认为独立性假设并无不利影响。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验