从文献中的基因列表中识别过度表达的概念：一种基于泊松混合模型的统计方法。

Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model.

机构信息

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.

出版信息

BMC Bioinformatics. 2010 May 20;11:272. doi: 10.1186/1471-2105-11-272.

DOI:10.1186/1471-2105-11-272

PMID:20487560

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2885378/

Abstract

BACKGROUND

Large-scale genomic studies often identify large gene lists, for example, the genes sharing the same expression patterns. The interpretation of these gene lists is generally achieved by extracting concepts overrepresented in the gene lists. This analysis often depends on manual annotation of genes based on controlled vocabularies, in particular, Gene Ontology (GO). However, the annotation of genes is a labor-intensive process; and the vocabularies are generally incomplete, leaving some important biological domains inadequately covered.

RESULTS

We propose a statistical method that uses the primary literature, i.e. free-text, as the source to perform overrepresentation analysis. The method is based on a statistical framework of mixture model and addresses the methodological flaws in several existing programs. We implemented this method within a literature mining system, BeeSpace, taking advantage of its analysis environment and added features that facilitate the interactive analysis of gene sets. Through experimentation with several datasets, we showed that our program can effectively summarize the important conceptual themes of large gene sets, even when traditional GO-based analysis does not yield informative results.

CONCLUSIONS

We conclude that the current work will provide biologists with a tool that effectively complements the existing ones for overrepresentation analysis from genomic experiments. Our program, Genelist Analyzer, is freely available at: http://workerbee.igb.uiuc.edu:8080/BeeSpace/Search.jsp.

摘要

背景

大规模基因组研究通常会识别出大量的基因列表，例如，具有相同表达模式的基因。这些基因列表的解释通常是通过提取基因列表中过度表达的概念来实现的。这种分析通常依赖于基于受控词汇表（特别是基因本体论（GO））的基因手动注释。然而，基因注释是一项劳动密集型的过程；并且词汇表通常不完整，导致一些重要的生物领域没有得到充分覆盖。

结果

我们提出了一种统计方法，该方法使用初级文献（即自由文本）作为来源进行过度表达分析。该方法基于混合模型的统计框架，并解决了几个现有程序中的方法学缺陷。我们在文献挖掘系统 BeeSpace 中实现了该方法，利用其分析环境并添加了便于基因集交互式分析的功能。通过对几个数据集的实验，我们表明，即使传统的基于 GO 的分析没有产生有意义的结果，我们的程序也可以有效地总结大基因集的重要概念主题。

结论

我们得出结论，目前的工作将为生物学家提供一种工具，有效地补充现有的基于基因组实验的过度表达分析工具。我们的程序 Genelist Analyzer 可在以下网址免费获取：http://workerbee.igb.uiuc.edu:8080/BeeSpace/Search.jsp。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0baa/2885378/fb1c8d35a21e/1471-2105-11-272-1.jpg

相似文献

Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model.从文献中的基因列表中识别过度表达的概念：一种基于泊松混合模型的统计方法。

BMC Bioinformatics. 2010 May 20;11:272. doi: 10.1186/1471-2105-11-272.

GO-Bayes: Gene Ontology-based overrepresentation analysis using a Bayesian approach.GO-Bayes：基于贝叶斯方法的基因本体论过表达分析。

Bioinformatics. 2010 Apr 1;26(7):905-11. doi: 10.1093/bioinformatics/btq059. Epub 2010 Feb 21.

Lists2Networks: integrated analysis of gene/protein lists.Lists2Networks：基因/蛋白质列表的综合分析。

BMC Bioinformatics. 2010 Feb 12;11:87. doi: 10.1186/1471-2105-11-87.

BeeSpace Navigator: exploratory analysis of gene function using semantic indexing of biological literature.BeeSpace Navigator：利用生物文献的语义索引进行基因功能的探索性分析。

Nucleic Acids Res. 2011 Jul;39(Web Server issue):W462-9. doi: 10.1093/nar/gkr285. Epub 2011 May 9.

GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists.GOrilla：一种用于在排序后的基因列表中发现和可视化富集的基因本体（GO）术语的工具。

BMC Bioinformatics. 2009 Feb 3;10:48. doi: 10.1186/1471-2105-10-48.

Improved detection of overrepresentation of Gene-Ontology annotations with parent child analysis.通过父子分析改进基因本体注释过度代表性的检测。

Bioinformatics. 2007 Nov 15;23(22):3024-31. doi: 10.1093/bioinformatics/btm440. Epub 2007 Sep 11.

Mining published lists of cancer related microarray experiments: identification of a gene expression signature having a critical role in cell-cycle control.挖掘已发表的癌症相关微阵列实验列表：鉴定在细胞周期调控中起关键作用的基因表达特征。

BMC Bioinformatics. 2005 Dec 1;6 Suppl 4(Suppl 4):S14. doi: 10.1186/1471-2105-6-S4-S14.

Differential regulation enrichment analysis via the integration of transcriptional regulatory network and gene expression data.通过整合转录调控网络和基因表达数据进行差异调控富集分析。

Bioinformatics. 2015 Feb 15;31(4):563-71. doi: 10.1093/bioinformatics/btu672. Epub 2014 Oct 15.

Dragon Plant Biology Explorer. A text-mining tool for integrating associations between genetic and biochemical entities with genome annotation and biochemical terms lists.龙舌兰植物生物学探索者。一种文本挖掘工具，用于整合遗传和生化实体之间的关联以及基因组注释和生化术语列表。

Plant Physiol. 2005 Aug;138(4):1914-25. doi: 10.1104/pp.105.060863.

Ontology annotation treebrowser : an interactive tool where the complementarity of medical subject headings and gene ontology improves the interpretation of gene lists.本体注释树浏览器：一种交互式工具，其中医学主题词表和基因本体的互补性提高了基因列表的解读。

Appl Bioinformatics. 2006;5(4):225-36. doi: 10.2165/00822942-200605040-00005.

引用本文的文献

Pathway Distiller - multisource biological pathway consolidation.Pathway Distiller - 多源生物途径整合。

BMC Genomics. 2012;13 Suppl 6(Suppl 6):S18. doi: 10.1186/1471-2164-13-S6-S18. Epub 2012 Oct 26.

BeeSpace Navigator: exploratory analysis of gene function using semantic indexing of biological literature.BeeSpace Navigator：利用生物文献的语义索引进行基因功能的探索性分析。

Nucleic Acids Res. 2011 Jul;39(Web Server issue):W462-9. doi: 10.1093/nar/gkr285. Epub 2011 May 9.

BSQA: integrated text mining using entity relation semantics extracted from biological literature of insects.BSQA：利用从昆虫生物学文献中提取的实体关系语义进行集成文本挖掘。

Nucleic Acids Res. 2010 Jul;38(Web Server issue):W175-81. doi: 10.1093/nar/gkq544.

本文引用的文献

SENT: semantic features in text.SENT：文本中的语义特征。

Nucleic Acids Res. 2009 Jul;37(Web Server issue):W153-9. doi: 10.1093/nar/gkp392. Epub 2009 May 20.

Text-based over-representation analysis of microarray gene lists with annotation bias.基于文本的带有注释偏差的微阵列基因列表过度代表性分析。

Nucleic Acids Res. 2009 Jun;37(11):e79. doi: 10.1093/nar/gkp310. Epub 2009 May 8.

Text mining for biology--the way forward: opinions from leading scientists.生物学文本挖掘——前进的道路：顶尖科学家的观点

Genome Biol. 2008;9 Suppl 2(Suppl 2):S7. doi: 10.1186/gb-2008-9-s2-s7. Epub 2008 Sep 1.

Seeking a new biology through text mining.通过文本挖掘寻找新的生物学。

Cell. 2008 Jul 11;134(1):9-13. doi: 10.1016/j.cell.2008.06.029.

Anni 2.0: a multipurpose text-mining tool for the life sciences.Anni 2.0：一款用于生命科学的多功能文本挖掘工具。

Genome Biol. 2008;9(6):R96. doi: 10.1186/gb-2008-9-6-r96. Epub 2008 Jun 12.

Species differences in brain gene expression profiles associated with adult behavioral maturation in honey bees.与蜜蜂成年行为成熟相关的大脑基因表达谱中的物种差异。

BMC Genomics. 2007 Jun 29;8:202. doi: 10.1186/1471-2164-8-202.

Text-derived concept profiles support assessment of DNA microarray data for acute myeloid leukemia and for androgen receptor stimulation.基于文本的概念概况有助于评估急性髓系白血病和雄激素受体刺激的DNA微阵列数据。

BMC Bioinformatics. 2007 Jan 18;8:14. doi: 10.1186/1471-2105-8-14.

Entrez Gene: gene-centered information at NCBI.Entrez基因：美国国立医学图书馆国家生物技术信息中心的基因中心信息。

Nucleic Acids Res. 2007 Jan;35(Database issue):D26-31. doi: 10.1093/nar/gkl993. Epub 2006 Dec 5.

Automatically generating gene summaries from biomedical literature.从生物医学文献中自动生成基因摘要。

Pac Symp Biocomput. 2006:40-51.

Genomic dissection of behavioral maturation in the honey bee.蜜蜂行为成熟的基因组剖析

Proc Natl Acad Sci U S A. 2006 Oct 31;103(44):16068-75. doi: 10.1073/pnas.0606909103. Epub 2006 Oct 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从文献中的基因列表中识别过度表达的概念：一种基于泊松混合模型的统计方法。

Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献