Suppr超能文献

用于功能富集分析的超几何检验的贝叶斯扩展。

A Bayesian extension of the hypergeometric test for functional enrichment analysis.

作者信息

Cao Jing, Zhang Song

机构信息

Department of Statistical Science, Southern Methodist University, Dallas, Texas 75275, U.S.A.

出版信息

Biometrics. 2014 Mar;70(1):84-94. doi: 10.1111/biom.12122. Epub 2013 Dec 9.

Abstract

Functional enrichment analysis is conducted on high-throughput data to provide functional interpretation for a list of genes or proteins that share a common property, such as being differentially expressed (DE). The hypergeometric P-value has been widely used to investigate whether genes from pre-defined functional terms, for example, Gene Ontology (GO), are enriched in the DE genes. The hypergeometric P-value has three limitations: (1) computed independently for each term, thus neglecting biological dependence; (2) subject to a size constraint that leads to the tendency of selecting less-specific terms; (3) repeated use of information due to overlapping annotations by the true-path rule. We propose a Bayesian approach based on the non-central hypergeometric model. The GO dependence structure is incorporated through a prior on non-centrality parameters. The likelihood function does not include overlapping information. The inference about enrichment is based on posterior probabilities that do not have a size constraint. This method can detect moderate but consistent enrichment signals and identify sets of closely related and biologically meaningful functional terms rather than isolated terms. We also describe the basic ideas of assumption and implementation of different methods to provide some theoretical insights, which are demonstrated via a simulation study. A real application is presented.

摘要

对高通量数据进行功能富集分析,以便为具有共同特性(如差异表达)的一组基因或蛋白质提供功能解释。超几何P值已被广泛用于研究预定义功能术语(如基因本体论(GO))中的基因是否在差异表达基因中富集。超几何P值有三个局限性:(1)针对每个术语独立计算,从而忽略了生物学依赖性;(2)受大小约束,导致倾向于选择特异性较低的术语;(3)由于真实路径规则的重叠注释而重复使用信息。我们提出了一种基于非中心超几何模型的贝叶斯方法。通过对非中心参数的先验纳入GO依赖结构。似然函数不包括重叠信息。关于富集的推断基于没有大小约束的后验概率。该方法可以检测到适度但一致的富集信号,并识别出一组密切相关且具有生物学意义的功能术语,而不是孤立的术语。我们还描述了不同方法的假设和实现的基本思想,以提供一些理论见解,并通过模拟研究进行了验证。展示了一个实际应用。

相似文献

引用本文的文献

3
Global hinge sites of proteins as target sites for drug binding.蛋白质的全局铰链位点作为药物结合的靶位点。
Proc Natl Acad Sci U S A. 2024 Dec 3;121(49):e2414333121. doi: 10.1073/pnas.2414333121. Epub 2024 Nov 25.
4
NCOA3 knockdown delays human embryo development.NCOA3基因敲低会延迟人类胚胎发育。
Heliyon. 2024 Sep 13;10(18):e37639. doi: 10.1016/j.heliyon.2024.e37639. eCollection 2024 Sep 30.

本文引用的文献

4
NOA: a novel Network Ontology Analysis method.NOA:一种新颖的网络本体分析方法。
Nucleic Acids Res. 2011 Jul;39(13):e87. doi: 10.1093/nar/gkr251. Epub 2011 May 4.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验