GeneRIF 索引：基于机器学习的句子选择。

GeneRIF indexing: sentence selection based on machine learning.

机构信息

National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894, USA.

出版信息

BMC Bioinformatics. 2013 May 31;14:171. doi: 10.1186/1471-2105-14-171.

DOI:10.1186/1471-2105-14-171

PMID:23725347

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3687823/

Abstract

BACKGROUND

A Gene Reference Into Function (GeneRIF) describes novel functionality of genes. GeneRIFs are available from the National Center for Biotechnology Information (NCBI) Gene database. GeneRIF indexing is performed manually, and the intention of our work is to provide methods to support creating the GeneRIF entries. The creation of GeneRIF entries involves the identification of the genes mentioned in MEDLINE®; citations and the sentences describing a novel function.

RESULTS

We have compared several learning algorithms and several features extracted or derived from MEDLINE sentences to determine if a sentence should be selected for GeneRIF indexing. Features are derived from the sentences or using mechanisms to augment the information provided by them: assigning a discourse label using a previously trained model, for example. We show that machine learning approaches with specific feature combinations achieve results close to one of the annotators. We have evaluated different feature sets and learning algorithms. In particular, Naïve Bayes achieves better performance with a selection of features similar to one used in related work, which considers the location of the sentence, the discourse of the sentence and the functional terminology in it.

CONCLUSIONS

The current performance is at a level similar to human annotation and it shows that machine learning can be used to automate the task of sentence selection for GeneRIF annotation. The current experiments are limited to the human species. We would like to see how the methodology can be extended to other species, specifically the normalization of gene mentions in other species.

摘要

背景

基因参考到功能（GeneRIF）描述了基因的新功能。GeneRIF 可从国家生物技术信息中心（NCBI）基因数据库获得。GeneRIF 索引是手动执行的，我们工作的目的是提供支持创建 GeneRIF 条目的方法。GeneRIF 条目的创建涉及识别 MEDLINE®中提到的基因；引用和描述新功能的句子。

结果

我们比较了几种学习算法和从 MEDLINE 句子中提取或派生的几种特征，以确定是否应选择句子进行 GeneRIF 索引。特征来自句子或使用机制来增强它们提供的信息：例如，使用先前训练的模型分配话语标签。我们表明，具有特定特征组合的机器学习方法可实现接近注释者之一的结果。我们已经评估了不同的特征集和学习算法。特别是，朴素贝叶斯（Naive Bayes）在选择与相关工作中使用的特征相似的特征集时，性能更好，该工作考虑了句子的位置、句子的话语和其中的功能术语。

结论

目前的性能与人工注释水平相当，这表明可以使用机器学习来自动化 GeneRIF 注释的句子选择任务。当前的实验仅限于人类物种。我们想看看该方法如何扩展到其他物种，特别是其他物种中基因提及的标准化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7bf2/3687823/467b03a69a57/1471-2105-14-171-1.jpg

相似文献

GeneRIF indexing: sentence selection based on machine learning.GeneRIF 索引：基于机器学习的句子选择。

BMC Bioinformatics. 2013 May 31;14:171. doi: 10.1186/1471-2105-14-171.

MeSH indexing based on automatically generated summaries.基于自动生成的摘要进行 MeSH 标引。

BMC Bioinformatics. 2013 Jun 26;14:208. doi: 10.1186/1471-2105-14-208.

Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction.用于自动提取基因功能简述的基因本体密度估计与话语分析。

BMC Bioinformatics. 2008 Apr 11;9 Suppl 3(Suppl 3):S9. doi: 10.1186/1471-2105-9-S3-S9.

Gene indexing: characterization and analysis of NLM's GeneRIFs.基因索引：美国国立医学图书馆基因相关信息摘要（GeneRIFs）的特征与分析

AMIA Annu Symp Proc. 2003;2003:460-4.

Comparison and combination of several MeSH indexing approaches.几种医学主题词（MeSH）标引方法的比较与组合

AMIA Annu Symp Proc. 2013 Nov 16;2013:709-18. eCollection 2013.

Finding GeneRIFs via gene ontology annotations.通过基因本体注释查找基因相关功能信息（GeneRIFs）

Pac Symp Biocomput. 2006:52-63.

Automatic inference of indexing rules for MEDLINE.医学文献数据库（MEDLINE）索引规则的自动推理

BMC Bioinformatics. 2008 Nov 19;9 Suppl 11(Suppl 11):S11. doi: 10.1186/1471-2105-9-S11-S11.

Optimal training sets for Bayesian prediction of MeSH assignment.用于医学主题词（MeSH）分配贝叶斯预测的最优训练集。

J Am Med Inform Assoc. 2008 Jul-Aug;15(4):546-53. doi: 10.1197/jamia.M2431. Epub 2008 Apr 24.

A bottom-up approach to MEDLINE indexing recommendations.一种自下而上的医学文献数据库（MEDLINE）索引推荐方法。

AMIA Annu Symp Proc. 2011;2011:1583-92. Epub 2011 Oct 22.

Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion.自动将全文生物医学文章中的句子分类为引言、方法、结果和讨论。

Bioinformatics. 2009 Dec 1;25(23):3174-80. doi: 10.1093/bioinformatics/btp548. Epub 2009 Sep 25.

引用本文的文献

A Gene Set Foundation Model Pre-Trained on a Massive Collection of Diverse Gene Sets.基于大量多样基因集集合预训练的基因集基础模型。

bioRxiv. 2025 Jun 2:2025.05.30.657124. doi: 10.1101/2025.05.30.657124.

Node-adaptive graph Transformer with structural encoding for accurate and robust lncRNA-disease association prediction.具有结构编码的节点自适应图 Transformer 用于准确稳健的 lncRNA-疾病关联预测。

BMC Genomics. 2024 Jan 18;25(1):73. doi: 10.1186/s12864-024-09998-2.

Single-cell analysis of gene expression in the substantia nigra pars compacta of a pesticide-induced mouse model of Parkinson's disease.农药诱导的帕金森病小鼠模型黑质致密部基因表达的单细胞分析

Transl Neurosci. 2022 Sep 1;13(1):255-269. doi: 10.1515/tnsci-2022-0237. eCollection 2022 Jan 1.

The Landscape of Virus-Host Protein-Protein Interaction Databases.病毒-宿主蛋白质-蛋白质相互作用数据库全景

Front Microbiol. 2022 Jul 15;13:827742. doi: 10.3389/fmicb.2022.827742. eCollection 2022.

GeneCup: mining PubMed and GWAS catalog for gene-keyword relationships.GeneCup：从 PubMed 和 GWAS 目录中挖掘基因-关键词关系。

G3 (Bethesda). 2022 May 6;12(5). doi: 10.1093/g3journal/jkac059.

Defining the Role of Nuclear Factor (NF)-κB p105 Subunit in Human Macrophage by Transcriptomic Analysis of Knockout THP1 Cells.通过对 NF-κB p105 亚基敲除 THP1 细胞的转录组分析定义人巨噬细胞中 NF-κB p105 亚基的作用。

Front Immunol. 2021 Oct 13;12:669906. doi: 10.3389/fimmu.2021.669906. eCollection 2021.

TarGo: network based target gene selection system for human disease related mouse models.TarGo：用于人类疾病相关小鼠模型的基于网络的靶基因选择系统。

Lab Anim Res. 2019 Nov 13;35:23. doi: 10.1186/s42826-019-0023-z. eCollection 2019.

How to Illuminate the Druggable Genome Using Pharos.如何利用 Pharos 照亮可成药基因组

Curr Protoc Bioinformatics. 2020 Mar;69(1):e92. doi: 10.1002/cpbi.92.

Long Noncoding RNA and Protein Interactions: From Experimental Results to Computational Models Based on Network Methods.长非编码 RNA 与蛋白质相互作用：基于网络方法的从实验结果到计算模型。

Int J Mol Sci. 2019 Mar 14;20(6):1284. doi: 10.3390/ijms20061284.

Identification of conclusive association entities in biomedical articles.生物医学文章中确凿关联实体的识别。

J Biomed Semantics. 2019 Jan 7;10(1):1. doi: 10.1186/s13326-018-0194-9.

本文引用的文献

PCorral--interactive mining of protein interactions from MEDLINE.PCorral--从 MEDLINE 中交互式挖掘蛋白质相互作用。

Database (Oxford). 2013 May 2;2013:bat030. doi: 10.1093/database/bat030. Print 2013.

Annotating the biomedical literature for the human variome.注释人类变异组的生物医学文献。

Database (Oxford). 2013 Apr 12;2013:bat019. doi: 10.1093/database/bat019. Print 2013.

Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.生物信息学工作流程和文本挖掘：BioCreative 2012 研讨会第二轨道概述。

Database (Oxford). 2012 Nov 17;2012:bas043. doi: 10.1093/database/bas043. Print 2012.

Text mining for the biocuration workflow.文本挖掘在生物注释工作流中的应用。

Database (Oxford). 2012 Apr 18;2012:bas020. doi: 10.1093/database/bas020. Print 2012.

BioCreative III interactive task: an overview.BioCreative III 交互式任务概述。

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S4. doi: 10.1186/1471-2105-12-S8-S4.

The gene normalization task in BioCreative III.BioCreative III 中的基因标准化任务。

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S2. doi: 10.1186/1471-2105-12-S8-S2.

Recommending MeSH terms for annotating biomedical articles.推荐用于标注生物医学文章的 MeSH 术语。

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):660-7. doi: 10.1136/amiajnl-2010-000055. Epub 2011 May 25.

An overview of MetaMap: historical perspective and recent advances.MetaMap 概述：历史视角与最新进展。

J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. doi: 10.1136/jamia.2009.002733.

Bioinformatics. 2009 Dec 1;25(23):3174-80. doi: 10.1093/bioinformatics/btp548. Epub 2009 Sep 25.

Inter-species normalization of gene mentions with GNAT.使用GNAT对基因提及进行种间标准化。

Bioinformatics. 2008 Aug 15;24(16):i126-132. doi: 10.1093/bioinformatics/btn299.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

GeneRIF 索引：基于机器学习的句子选择。

GeneRIF indexing: sentence selection based on machine learning.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献