一种用于基因本体论（GO）富集分析的概率生成模型。

A probabilistic generative model for GO enrichment analysis.

作者信息

Lu Yong, Rosenfeld Roni, Simon Itamar, Nau Gerard J, Bar-Joseph Ziv

机构信息

Computer Science Department, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213 USA.

出版信息

Nucleic Acids Res. 2008 Oct;36(17):e109. doi: 10.1093/nar/gkn434. Epub 2008 Aug 1.

DOI:10.1093/nar/gkn434

PMID:18676451

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2553574/

Abstract

The Gene Ontology (GO) is extensively used to analyze all types of high-throughput experiments. However, researchers still face several challenges when using GO and other functional annotation databases. One problem is the large number of multiple hypotheses that are being tested for each study. In addition, categories often overlap with both direct parents/descendents and other distant categories in the hierarchical structure. This makes it hard to determine if the identified significant categories represent different functional outcomes or rather a redundant view of the same biological processes. To overcome these problems we developed a generative probabilistic model which identifies a (small) subset of categories that, together, explain the selected gene set. Our model accommodates noise and errors in the selected gene set and GO. Using controlled GO data our method correctly recovered most of the selected categories, leading to dramatic improvements over current methods for GO analysis. When used with microarray expression data and ChIP-chip data from yeast and human our method was able to correctly identify both general and specific enriched categories which were overlooked by other methods.

摘要

基因本体论（GO）被广泛用于分析各类高通量实验。然而，研究人员在使用GO和其他功能注释数据库时仍面临若干挑战。一个问题是每项研究要检验大量的多重假设。此外，在层次结构中，类别常常与直接的父类/子类以及其他不相关的类别重叠。这使得难以确定所识别出的显著类别是代表不同的功能结果，还是仅仅是对相同生物过程的冗余观点。为克服这些问题，我们开发了一种生成概率模型，该模型能识别出一组（少量的）类别，这些类别共同解释所选的基因集。我们的模型考虑了所选基因集和GO中的噪声与误差。使用经过控制的GO数据，我们的方法正确地找回了大部分所选类别，相较于当前的GO分析方法有显著改进。当与来自酵母和人类的微阵列表达数据以及芯片杂交数据一起使用时，我们的方法能够正确识别出其他方法所忽略的一般和特定的富集类别。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f3ec/2553574/9edd7b0cfb5c/gkn434f1.jpg

相似文献

A probabilistic generative model for GO enrichment analysis.一种用于基因本体论（GO）富集分析的概率生成模型。

Nucleic Acids Res. 2008 Oct;36(17):e109. doi: 10.1093/nar/gkn434. Epub 2008 Aug 1.

NetGen: a novel network-based probabilistic generative model for gene set functional enrichment analysis.NetGen：一种用于基因集功能富集分析的基于网络的新型概率生成模型。

BMC Syst Biol. 2017 Sep 21;11(Suppl 4):75. doi: 10.1186/s12918-017-0456-7.

GOEAST: a web-based software toolkit for Gene Ontology enrichment analysis.GOEAST：一个用于基因本体富集分析的基于网络的软件工具包。

Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W358-63. doi: 10.1093/nar/gkn276. Epub 2008 May 16.

Onto-CC: a web server for identifying Gene Ontology conceptual clusters.Onto-CC：一个用于识别基因本体概念簇的网络服务器。

Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W352-7. doi: 10.1093/nar/gkn323. Epub 2008 Jun 10.

Comparing gene annotation enrichment tools for functional modeling of agricultural microarray data.比较基因注释富集工具在农业微阵列数据分析中的功能建模。

BMC Bioinformatics. 2009 Oct 8;10 Suppl 11(Suppl 11):S9. doi: 10.1186/1471-2105-10-S11-S9.

The Neural/Immune Gene Ontology: clipping the Gene Ontology for neurological and immunological systems.神经/免疫基因本体论：为神经系统和免疫系统裁剪基因本体论。

BMC Bioinformatics. 2010 Sep 12;11:458. doi: 10.1186/1471-2105-11-458.

Bayesian assignment of gene ontology terms to gene expression experiments.贝叶斯基因本体论术语分配到基因表达实验。

Bioinformatics. 2012 Sep 15;28(18):i603-i610. doi: 10.1093/bioinformatics/bts405.

GOing Bayesian: model-based gene set analysis of genome-scale data.GOing Bayesian：基于模型的全基因组数据基因集分析。

Nucleic Acids Res. 2010 Jun;38(11):3523-32. doi: 10.1093/nar/gkq045. Epub 2010 Feb 19.

Gene set internal coherence in the context of functional profiling.功能谱分析背景下的基因集内部一致性。

BMC Genomics. 2009 Apr 27;10:197. doi: 10.1186/1471-2164-10-197.

GObar: a gene ontology based analysis and visualization tool for gene sets.GObar：一种基于基因本体论的基因集分析与可视化工具。

BMC Bioinformatics. 2005 Jul 25;6:189. doi: 10.1186/1471-2105-6-189.

引用本文的文献

APOBEC1-Dependent RNA Eiting of TNF Signaling Orchestrates Ileal Villus Morphogenesis in Pigs: Integrative Transcriptomic and Editomic Insights.载脂蛋白B mRNA编辑酶催化多肽样3G介导的肿瘤坏死因子信号通路的RNA编辑调控猪回肠绒毛形态发生：转录组学和编辑组学的综合见解

Animals (Basel). 2025 Aug 18;15(16):2419. doi: 10.3390/ani15162419.

A novel stemness-related lncRNA signature predicts prognosis, immune infiltration and drug sensitivity of clear cell renal cell carcinoma.一种新型的干性相关长链非编码RNA特征可预测透明细胞肾细胞癌的预后、免疫浸润和药物敏感性。

J Transl Med. 2025 Feb 27;23(1):238. doi: 10.1186/s12967-025-06251-6.

Genome-Wide Association Studies for Lactation Performance in Buffaloes.水牛泌乳性能的全基因组关联研究

Genes (Basel). 2025 Jan 27;16(2):163. doi: 10.3390/genes16020163.

Transcriptomic Approaches to Investigate the Anti-Aging Effects of Blueberry Anthocyanins in a Caenorhabditis Elegans Aging Model.利用转录组学方法在秀丽隐杆线虫衰老模型中研究蓝莓花青素的抗衰老作用

Antioxidants (Basel). 2024 Dec 30;14(1):35. doi: 10.3390/antiox14010035.

Comparative transcriptomes and WGCNA reveal hub genes for spike germination in different quinoa lines.比较转录组学和加权基因共表达网络分析揭示不同藜麦品系穗发芽的关键基因

BMC Genomics. 2024 Dec 20;25(1):1231. doi: 10.1186/s12864-024-11151-y.

Development of a prognostic model for early-stage gastric cancer-related DNA methylation-driven genes and analysis of immune landscape.早期胃癌相关DNA甲基化驱动基因的预后模型开发及免疫格局分析

Front Mol Biosci. 2024 Oct 30;11:1455890. doi: 10.3389/fmolb.2024.1455890. eCollection 2024.

Targeted protein degradation using chimeric human E2 ubiquitin-conjugating enzymes.利用嵌合人 E2 泛素连接酶进行靶向蛋白降解。

Commun Biol. 2024 Sep 19;7(1):1179. doi: 10.1038/s42003-024-06803-4.

Unveiling the molecular basis of paracetamol-induced hepatotoxicity: Interaction of -acetyl--benzoquinone imine with mitochondrial succinate dehydrogenase.揭示对乙酰氨基酚诱导肝毒性的分子基础：N-乙酰-对苯醌亚胺与线粒体琥珀酸脱氢酶的相互作用

Biochem Biophys Rep. 2024 May 7;38:101727. doi: 10.1016/j.bbrep.2024.101727. eCollection 2024 Jul.

GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership.GoM DE：利用允许成员等级的差异表达分析来解释序列计数数据中的结构。

Genome Biol. 2023 Oct 19;24(1):236. doi: 10.1186/s13059-023-03067-9.

CXCR4 Expressed by Tumor-Infiltrating B Cells in Gastric Cancer Related to Survival in the Tumor Microenvironment: An Analysis Combining Single-Cell RNA Sequencing with Bulk RNA Sequencing.肿瘤浸润 B 细胞表达的 CXCR4 与肿瘤微环境中的生存相关：单细胞 RNA 测序与批量 RNA 测序相结合的分析。

Int J Mol Sci. 2023 Aug 17;24(16):12890. doi: 10.3390/ijms241612890.

本文引用的文献

Improved detection of overrepresentation of Gene-Ontology annotations with parent child analysis.通过父子分析改进基因本体注释过度代表性的检测。

Bioinformatics. 2007 Nov 15;23(22):3024-31. doi: 10.1093/bioinformatics/btm440. Epub 2007 Sep 11.

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.ENCODE试点项目对人类基因组1%的功能元件进行鉴定与分析。

Nature. 2007 Jun 14;447(7146):799-816. doi: 10.1038/nature05874.

Large-scale mapping of human protein-protein interactions by mass spectrometry.通过质谱法对人类蛋白质-蛋白质相互作用进行大规模图谱绘制。

Mol Syst Biol. 2007;3:89. doi: 10.1038/msb4100134. Epub 2007 Mar 13.

NOTCH1 directly regulates c-MYC and activates a feed-forward-loop transcriptional network promoting leukemic cell growth.NOTCH1直接调控c-MYC并激活促进白血病细胞生长的前馈环转录网络。

Proc Natl Acad Sci U S A. 2006 Nov 28;103(48):18261-6. doi: 10.1073/pnas.0606108103. Epub 2006 Nov 17.

Multiple knockout analysis of genetic robustness in the yeast metabolic network.酵母代谢网络中遗传稳健性的多重敲除分析。

Nat Genet. 2006 Sep;38(9):993-8. doi: 10.1038/ng1856.

Improved scoring of functional groups from gene expression data by decorrelating GO graph structure.通过去相关GO图结构从基因表达数据中改进功能组的评分。

Bioinformatics. 2006 Jul 1;22(13):1600-7. doi: 10.1093/bioinformatics/btl140. Epub 2006 Apr 10.

STEM: a tool for the analysis of short time series gene expression data.STEM：一种用于分析短时间序列基因表达数据的工具。

BMC Bioinformatics. 2006 Apr 5;7:191. doi: 10.1186/1471-2105-7-191.

Functional expansion of aminoacyl-tRNA synthetases and their interacting factors: new perspectives on housekeepers.氨酰-tRNA合成酶及其相互作用因子的功能扩展：管家基因的新视角。

Trends Biochem Sci. 2005 Oct;30(10):569-74. doi: 10.1016/j.tibs.2005.08.004.

Clustering short time series gene expression data.聚类短时间序列基因表达数据。

Bioinformatics. 2005 Jun;21 Suppl 1:i159-68. doi: 10.1093/bioinformatics/bti1022.

Standardizing global gene expression analysis between laboratories and across platforms.在不同实验室和跨平台之间标准化全球基因表达分析。

Nat Methods. 2005 May;2(5):351-6. doi: 10.1038/nmeth754. Epub 2005 Apr 21.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于基因本体论（GO）富集分析的概率生成模型。

A probabilistic generative model for GO enrichment analysis.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献