Suppr超能文献

GeneRIF 索引:基于机器学习的句子选择。

GeneRIF indexing: sentence selection based on machine learning.

机构信息

National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA.

出版信息

BMC Bioinformatics. 2013 May 31;14:171. doi: 10.1186/1471-2105-14-171.

Abstract

BACKGROUND

A Gene Reference Into Function (GeneRIF) describes novel functionality of genes. GeneRIFs are available from the National Center for Biotechnology Information (NCBI) Gene database. GeneRIF indexing is performed manually, and the intention of our work is to provide methods to support creating the GeneRIF entries. The creation of GeneRIF entries involves the identification of the genes mentioned in MEDLINE®; citations and the sentences describing a novel function.

RESULTS

We have compared several learning algorithms and several features extracted or derived from MEDLINE sentences to determine if a sentence should be selected for GeneRIF indexing. Features are derived from the sentences or using mechanisms to augment the information provided by them: assigning a discourse label using a previously trained model, for example. We show that machine learning approaches with specific feature combinations achieve results close to one of the annotators. We have evaluated different feature sets and learning algorithms. In particular, Naïve Bayes achieves better performance with a selection of features similar to one used in related work, which considers the location of the sentence, the discourse of the sentence and the functional terminology in it.

CONCLUSIONS

The current performance is at a level similar to human annotation and it shows that machine learning can be used to automate the task of sentence selection for GeneRIF annotation. The current experiments are limited to the human species. We would like to see how the methodology can be extended to other species, specifically the normalization of gene mentions in other species.

摘要

背景

基因参考到功能(GeneRIF)描述了基因的新功能。GeneRIF 可从国家生物技术信息中心(NCBI)基因数据库获得。GeneRIF 索引是手动执行的,我们工作的目的是提供支持创建 GeneRIF 条目的方法。GeneRIF 条目的创建涉及识别 MEDLINE®中提到的基因;引用和描述新功能的句子。

结果

我们比较了几种学习算法和从 MEDLINE 句子中提取或派生的几种特征,以确定是否应选择句子进行 GeneRIF 索引。特征来自句子或使用机制来增强它们提供的信息:例如,使用先前训练的模型分配话语标签。我们表明,具有特定特征组合的机器学习方法可实现接近注释者之一的结果。我们已经评估了不同的特征集和学习算法。特别是,朴素贝叶斯(Naive Bayes)在选择与相关工作中使用的特征相似的特征集时,性能更好,该工作考虑了句子的位置、句子的话语和其中的功能术语。

结论

目前的性能与人工注释水平相当,这表明可以使用机器学习来自动化 GeneRIF 注释的句子选择任务。当前的实验仅限于人类物种。我们想看看该方法如何扩展到其他物种,特别是其他物种中基因提及的标准化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bf2/3687823/467b03a69a57/1471-2105-14-171-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验