用于自动提取基因功能简述的基因本体密度估计与话语分析。

Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction.

作者信息

Gobeill Julien, Tbahriti Imad, Ehrler Frédéric, Mottaz Anaïs, Veuthey Anne-Lise, Ruch Patrick

机构信息

University and Hospitals of Geneva, Geneva, Switzerland.

出版信息

BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S9. doi: 10.1186/1471-2105-9-S3-S9.

DOI:10.1186/1471-2105-9-S3-S9

PMID:18426554

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2352866/

Abstract

BACKGROUND

This paper describes and evaluates a sentence selection engine that extracts a GeneRiF (Gene Reference into Functions) as defined in ENTREZ-Gene based on a MEDLINE record. Inputs for this task include both a gene and a pointer to a MEDLINE reference. In the suggested approach we merge two independent sentence extraction strategies. The first proposed strategy (LASt) uses argumentative features, inspired by discourse-analysis models. The second extraction scheme (GOEx) uses an automatic text categorizer to estimate the density of Gene Ontology categories in every sentence; thus providing a full ranking of all possible candidate GeneRiFs. A combination of the two approaches is proposed, which also aims at reducing the size of the selected segment by filtering out non-content bearing rhetorical phrases.

RESULTS

Based on the TREC-2003 Genomics collection for GeneRiF identification, the LASt extraction strategy is already competitive (52.78%). When used in a combined approach, the extraction task clearly shows improvement, achieving a Dice score of over 57% (+10%).

CONCLUSIONS

Argumentative representation levels and conceptual density estimation using Gene Ontology contents appear complementary for functional annotation in proteomics.

摘要

背景

本文描述并评估了一种句子选择引擎，该引擎基于MEDLINE记录提取ENTREZ - Gene中定义的基因功能参考（GeneRiF）。此任务的输入包括一个基因和一个指向MEDLINE参考文献的指针。在建议的方法中，我们合并了两种独立的句子提取策略。第一种提出的策略（LASt）使用受话语分析模型启发的论证特征。第二种提取方案（GOEx）使用自动文本分类器来估计每个句子中基因本体类别（Gene Ontology）的密度；从而对所有可能的候选基因功能参考进行全面排名。提出了两种方法的组合，其目的还在于通过过滤掉无内容的修辞短语来减小所选片段的大小。

结果

基于TREC - 2003基因组学数据集进行基因功能参考识别，LASt提取策略已经具有竞争力（52.78%）。当用于组合方法时，提取任务明显显示出改进，获得了超过57%的Dice分数（提高了10%）。

结论

论证表示水平和使用基因本体内容的概念密度估计在蛋白质组学的功能注释中似乎具有互补性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea72/2352866/d5c97b85c189/1471-2105-9-S3-S9-1.jpg

相似文献

Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction.用于自动提取基因功能简述的基因本体密度估计与话语分析。

BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S9. doi: 10.1186/1471-2105-9-S3-S9.

Automatic assignment of biomedical categories: toward a generic approach.生物医学类别的自动分配：迈向通用方法

Bioinformatics. 2006 Mar 15;22(6):658-64. doi: 10.1093/bioinformatics/bti783. Epub 2005 Nov 15.

GeneRIF indexing: sentence selection based on machine learning.GeneRIF 索引：基于机器学习的句子选择。

BMC Bioinformatics. 2013 May 31;14:171. doi: 10.1186/1471-2105-14-171.

Finding GeneRIFs via gene ontology annotations.通过基因本体注释查找基因相关功能信息（GeneRIFs）

Pac Symp Biocomput. 2006:52-63.

Extraction of semantic biomedical relations from text using conditional random fields.使用条件随机场从文本中提取语义生物医学关系。

BMC Bioinformatics. 2008 Apr 23;9:207. doi: 10.1186/1471-2105-9-207.

Discovering gene annotations in biomedical text databases.在生物医学文本数据库中发现基因注释。

BMC Bioinformatics. 2008 Mar 6;9:143. doi: 10.1186/1471-2105-9-143.

Monitoring the evolutionary aspect of the Gene Ontology to enhance predictability and usability.监测基因本体论的进化方面以提高可预测性和可用性。

BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S7. doi: 10.1186/1471-2105-9-S3-S7.

Identifying biological concepts from a protein-related corpus with a probabilistic topic model.使用概率主题模型从蛋白质相关语料库中识别生物学概念。

BMC Bioinformatics. 2006 Feb 8;7:58. doi: 10.1186/1471-2105-7-58.

Using discourse analysis to improve text categorization in MEDLINE.运用话语分析改进医学在线数据库（MEDLINE）中的文本分类

Stud Health Technol Inform. 2007;129(Pt 1):710-5.

Automatic extraction of gene/protein biological functions from biomedical text.从生物医学文本中自动提取基因/蛋白质的生物学功能。

Bioinformatics. 2005 Apr 1;21(7):1227-36. doi: 10.1093/bioinformatics/bti084. Epub 2004 Oct 27.

引用本文的文献

Integration of background knowledge for automatic detection of inconsistencies in gene ontology annotation.背景知识的整合用于自动检测基因本体论注释中的不一致性。

Bioinformatics. 2024 Jun 28;40(Suppl 1):i390-i400. doi: 10.1093/bioinformatics/btae246.

SciLite: a platform for displaying text-mined annotations as a means to link research articles with biological data.SciLite：一个用于显示文本挖掘注释的平台，作为将研究文章与生物数据相链接的一种手段。

Wellcome Open Res. 2017 Jul 10;1:25. doi: 10.12688/wellcomeopenres.10210.2. eCollection 2016.

dbCPG: A web resource for cancer predisposition genes.dbCPG：癌症易感基因的网络资源。

Oncotarget. 2016 Jun 21;7(25):37803-37811. doi: 10.18632/oncotarget.9334.

Flow-dependent regulation of genome-wide mRNA and microRNA expression in endothelial cells in vivo.血流依赖性调节内皮细胞中全基因组 mRNA 和 microRNA 的表达。

Sci Data. 2014 Oct 28;1:140039. doi: 10.1038/sdata.2014.39. eCollection 2014.

GeneRIF indexing: sentence selection based on machine learning.GeneRIF 索引：基于机器学习的句子选择。

BMC Bioinformatics. 2013 May 31;14:171. doi: 10.1186/1471-2105-14-171.

A framework for annotating human genome in disease context.疾病语境下人类基因组注释框架。

PLoS One. 2012;7(12):e49686. doi: 10.1371/journal.pone.0049686. Epub 2012 Dec 10.

ReCGiP, a database of reproduction candidate genes in pigs based on bibliomics.基于文献计量学的猪繁殖候选基因数据库 ReCGiP。

Reprod Biol Endocrinol. 2010 Aug 14;8:96. doi: 10.1186/1477-7827-8-96.

QuickGO: a user tutorial for the web-based Gene Ontology browser.QuickGO：基于网络的基因本体浏览器用户指南

Database (Oxford). 2009;2009:bap010. doi: 10.1093/database/bap010. Epub 2009 Sep 29.

Automatic medical encoding with SNOMED categories.使用SNOMED分类进行自动医学编码。

BMC Med Inform Decis Mak. 2008 Oct 27;8 Suppl 1(Suppl 1):S6. doi: 10.1186/1472-6947-8-S1-S6.

From episodes of care to diagnosis codes: automatic text categorization for medico-economic encoding.从医疗事件到诊断编码：用于医疗经济编码的自动文本分类

AMIA Annu Symp Proc. 2008 Nov 6;2008:636-40.

本文引用的文献

A comparative analysis of retrieval features used in the TREC 2006 Genomics Track passage retrieval task.2006年TREC基因组学跟踪段落检索任务中使用的检索特征的比较分析。

AMIA Annu Symp Proc. 2007 Oct 11;2007:620-4.

Mapping proteins to disease terminologies: from UniProt to MeSH.将蛋白质映射到疾病术语：从通用蛋白质数据库（UniProt）到医学主题词表（MeSH）。

BMC Bioinformatics. 2008 Apr 29;9 Suppl 5(Suppl 5):S3. doi: 10.1186/1471-2105-9-S5-S3.

Mining experimental evidence of molecular function claims from the literature.从文献中挖掘分子功能声明的实验证据。

Bioinformatics. 2007 Dec 1;23(23):3232-40. doi: 10.1093/bioinformatics/btm495. Epub 2007 Oct 17.

Predicting gene functions from text using a cross-species approach.使用跨物种方法从文本中预测基因功能。

Pac Symp Biocomput. 2006:88-99.

Text mining of full-text journal articles combined with gene expression analysis reveals a relationship between sphingosine-1-phosphate and invasiveness of a glioblastoma cell line.结合基因表达分析的全文期刊文章文本挖掘揭示了1-磷酸鞘氨醇与胶质母细胞瘤细胞系侵袭性之间的关系。

BMC Bioinformatics. 2006 Aug 10;7:373. doi: 10.1186/1471-2105-7-373.

Using argumentation to extract key sentences from biomedical abstracts.利用论证从生物医学摘要中提取关键句子。

Int J Med Inform. 2007 Feb-Mar;76(2-3):195-200. doi: 10.1016/j.ijmedinf.2006.05.002. Epub 2006 Jul 11.

Automatic assignment of biomedical categories: toward a generic approach.生物医学类别的自动分配：迈向通用方法

Bioinformatics. 2006 Mar 15;22(6):658-64. doi: 10.1093/bioinformatics/bti783. Epub 2005 Nov 15.

Using argumentation to retrieve articles with similar citations: an inquiry into improving related articles search in the MEDLINE digital library.利用论证检索具有相似引用的文章：对改进医学文献数据库（MEDLINE）数字图书馆中相关文章搜索的探究。

Int J Med Inform. 2006 Jun;75(6):488-95. doi: 10.1016/j.ijmedinf.2005.06.007. Epub 2005 Sep 13.

Will a biological database be different from a biological journal?生物数据库会与生物学期刊有所不同吗？

PLoS Comput Biol. 2005 Aug;1(3):179-81. doi: 10.1371/journal.pcbi.0010034.

Recent advances in natural language processing for biomedical applications.生物医学应用中自然语言处理的最新进展。

Int J Med Inform. 2006 Jun;75(6):413-7. doi: 10.1016/j.ijmedinf.2005.06.008. Epub 2005 Aug 31.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于自动提取基因功能简述的基因本体密度估计与话语分析。

Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献