文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

从文献中的基因列表中识别过度表达的概念:一种基于泊松混合模型的统计方法。

Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model.

机构信息

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.

出版信息

BMC Bioinformatics. 2010 May 20;11:272. doi: 10.1186/1471-2105-11-272.


DOI:10.1186/1471-2105-11-272
PMID:20487560
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2885378/
Abstract

BACKGROUND: Large-scale genomic studies often identify large gene lists, for example, the genes sharing the same expression patterns. The interpretation of these gene lists is generally achieved by extracting concepts overrepresented in the gene lists. This analysis often depends on manual annotation of genes based on controlled vocabularies, in particular, Gene Ontology (GO). However, the annotation of genes is a labor-intensive process; and the vocabularies are generally incomplete, leaving some important biological domains inadequately covered. RESULTS: We propose a statistical method that uses the primary literature, i.e. free-text, as the source to perform overrepresentation analysis. The method is based on a statistical framework of mixture model and addresses the methodological flaws in several existing programs. We implemented this method within a literature mining system, BeeSpace, taking advantage of its analysis environment and added features that facilitate the interactive analysis of gene sets. Through experimentation with several datasets, we showed that our program can effectively summarize the important conceptual themes of large gene sets, even when traditional GO-based analysis does not yield informative results. CONCLUSIONS: We conclude that the current work will provide biologists with a tool that effectively complements the existing ones for overrepresentation analysis from genomic experiments. Our program, Genelist Analyzer, is freely available at: http://workerbee.igb.uiuc.edu:8080/BeeSpace/Search.jsp.

摘要

背景:大规模基因组研究通常会识别出大量的基因列表,例如,具有相同表达模式的基因。这些基因列表的解释通常是通过提取基因列表中过度表达的概念来实现的。这种分析通常依赖于基于受控词汇表(特别是基因本体论(GO))的基因手动注释。然而,基因注释是一项劳动密集型的过程;并且词汇表通常不完整,导致一些重要的生物领域没有得到充分覆盖。

结果:我们提出了一种统计方法,该方法使用初级文献(即自由文本)作为来源进行过度表达分析。该方法基于混合模型的统计框架,并解决了几个现有程序中的方法学缺陷。我们在文献挖掘系统 BeeSpace 中实现了该方法,利用其分析环境并添加了便于基因集交互式分析的功能。通过对几个数据集的实验,我们表明,即使传统的基于 GO 的分析没有产生有意义的结果,我们的程序也可以有效地总结大基因集的重要概念主题。

结论:我们得出结论,目前的工作将为生物学家提供一种工具,有效地补充现有的基于基因组实验的过度表达分析工具。我们的程序 Genelist Analyzer 可在以下网址免费获取:http://workerbee.igb.uiuc.edu:8080/BeeSpace/Search.jsp。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0baa/2885378/7a8dbaa631e8/1471-2105-11-272-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0baa/2885378/fb1c8d35a21e/1471-2105-11-272-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0baa/2885378/6d0853eaece8/1471-2105-11-272-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0baa/2885378/f6a49ac74234/1471-2105-11-272-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0baa/2885378/7a8dbaa631e8/1471-2105-11-272-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0baa/2885378/fb1c8d35a21e/1471-2105-11-272-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0baa/2885378/6d0853eaece8/1471-2105-11-272-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0baa/2885378/f6a49ac74234/1471-2105-11-272-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0baa/2885378/7a8dbaa631e8/1471-2105-11-272-4.jpg

相似文献

[1]
Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model.

BMC Bioinformatics. 2010-5-20

[2]
GO-Bayes: Gene Ontology-based overrepresentation analysis using a Bayesian approach.

Bioinformatics. 2010-2-21

[3]
Lists2Networks: integrated analysis of gene/protein lists.

BMC Bioinformatics. 2010-2-12

[4]
BeeSpace Navigator: exploratory analysis of gene function using semantic indexing of biological literature.

Nucleic Acids Res. 2011-5-9

[5]
GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists.

BMC Bioinformatics. 2009-2-3

[6]
Improved detection of overrepresentation of Gene-Ontology annotations with parent child analysis.

Bioinformatics. 2007-11-15

[7]
Mining published lists of cancer related microarray experiments: identification of a gene expression signature having a critical role in cell-cycle control.

BMC Bioinformatics. 2005-12-1

[8]
Differential regulation enrichment analysis via the integration of transcriptional regulatory network and gene expression data.

Bioinformatics. 2015-2-15

[9]
Dragon Plant Biology Explorer. A text-mining tool for integrating associations between genetic and biochemical entities with genome annotation and biochemical terms lists.

Plant Physiol. 2005-8

[10]
Ontology annotation treebrowser : an interactive tool where the complementarity of medical subject headings and gene ontology improves the interpretation of gene lists.

Appl Bioinformatics. 2006

引用本文的文献

[1]
Pathway Distiller - multisource biological pathway consolidation.

BMC Genomics. 2012-10-26

[2]
BeeSpace Navigator: exploratory analysis of gene function using semantic indexing of biological literature.

Nucleic Acids Res. 2011-5-9

[3]
BSQA: integrated text mining using entity relation semantics extracted from biological literature of insects.

Nucleic Acids Res. 2010-7

本文引用的文献

[1]
SENT: semantic features in text.

Nucleic Acids Res. 2009-7

[2]
Text-based over-representation analysis of microarray gene lists with annotation bias.

Nucleic Acids Res. 2009-6

[3]
Text mining for biology--the way forward: opinions from leading scientists.

Genome Biol. 2008

[4]
Seeking a new biology through text mining.

Cell. 2008-7-11

[5]
Anni 2.0: a multipurpose text-mining tool for the life sciences.

Genome Biol. 2008

[6]
Species differences in brain gene expression profiles associated with adult behavioral maturation in honey bees.

BMC Genomics. 2007-6-29

[7]
Text-derived concept profiles support assessment of DNA microarray data for acute myeloid leukemia and for androgen receptor stimulation.

BMC Bioinformatics. 2007-1-18

[8]
Entrez Gene: gene-centered information at NCBI.

Nucleic Acids Res. 2007-1

[9]
Automatically generating gene summaries from biomedical literature.

Pac Symp Biocomput. 2006

[10]
Genomic dissection of behavioral maturation in the honey bee.

Proc Natl Acad Sci U S A. 2006-10-31

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索