Suppr超能文献

用于自动提取基因功能简述的基因本体密度估计与话语分析。

Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction.

作者信息

Gobeill Julien, Tbahriti Imad, Ehrler Frédéric, Mottaz Anaïs, Veuthey Anne-Lise, Ruch Patrick

机构信息

University and Hospitals of Geneva, Geneva, Switzerland.

出版信息

BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S9. doi: 10.1186/1471-2105-9-S3-S9.

Abstract

BACKGROUND

This paper describes and evaluates a sentence selection engine that extracts a GeneRiF (Gene Reference into Functions) as defined in ENTREZ-Gene based on a MEDLINE record. Inputs for this task include both a gene and a pointer to a MEDLINE reference. In the suggested approach we merge two independent sentence extraction strategies. The first proposed strategy (LASt) uses argumentative features, inspired by discourse-analysis models. The second extraction scheme (GOEx) uses an automatic text categorizer to estimate the density of Gene Ontology categories in every sentence; thus providing a full ranking of all possible candidate GeneRiFs. A combination of the two approaches is proposed, which also aims at reducing the size of the selected segment by filtering out non-content bearing rhetorical phrases.

RESULTS

Based on the TREC-2003 Genomics collection for GeneRiF identification, the LASt extraction strategy is already competitive (52.78%). When used in a combined approach, the extraction task clearly shows improvement, achieving a Dice score of over 57% (+10%).

CONCLUSIONS

Argumentative representation levels and conceptual density estimation using Gene Ontology contents appear complementary for functional annotation in proteomics.

摘要

背景

本文描述并评估了一种句子选择引擎,该引擎基于MEDLINE记录提取ENTREZ - Gene中定义的基因功能参考(GeneRiF)。此任务的输入包括一个基因和一个指向MEDLINE参考文献的指针。在建议的方法中,我们合并了两种独立的句子提取策略。第一种提出的策略(LASt)使用受话语分析模型启发的论证特征。第二种提取方案(GOEx)使用自动文本分类器来估计每个句子中基因本体类别(Gene Ontology)的密度;从而对所有可能的候选基因功能参考进行全面排名。提出了两种方法的组合,其目的还在于通过过滤掉无内容的修辞短语来减小所选片段的大小。

结果

基于TREC - 2003基因组学数据集进行基因功能参考识别,LASt提取策略已经具有竞争力(52.78%)。当用于组合方法时,提取任务明显显示出改进,获得了超过57%的Dice分数(提高了10%)。

结论

论证表示水平和使用基因本体内容的概念密度估计在蛋白质组学的功能注释中似乎具有互补性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea72/2352866/d5c97b85c189/1471-2105-9-S3-S9-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验