从文本实体对中提取生物医学事件。

Extracting biomedical events from pairs of text entities.

作者信息

Liu Xiao, Bordes Antoine, Grandvalet Yves

出版信息

BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S8. doi: 10.1186/1471-2105-16-S10-S8. Epub 2015 Jul 13.

DOI:10.1186/1471-2105-16-S10-S8

PMID:26201478

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4511465/

Abstract

BACKGROUND

Huge amounts of electronic biomedical documents, such as molecular biology reports or genomic papers are generated daily. Nowadays, these documents are mainly available in the form of unstructured free texts, which require heavy processing for their registration into organized databases. This organization is instrumental for information retrieval, enabling to answer the advanced queries of researchers and practitioners in biology, medicine, and related fields. Hence, the massive data flow calls for efficient automatic methods of text-mining that extract high-level information, such as biomedical events, from biomedical text. The usual computational tools of Natural Language Processing cannot be readily applied to extract these biomedical events, due to the peculiarities of the domain. Indeed, biomedical documents contain highly domain-specific jargon and syntax. These documents also describe distinctive dependencies, making text-mining in molecular biology a specific discipline.

RESULTS

We address biomedical event extraction as the classification of pairs of text entities into the classes corresponding to event types. The candidate pairs of text entities are recursively provided to a multiclass classifier relying on Support Vector Machines. This recursive process extracts events involving other events as arguments. Compared to joint models based on Markov Random Fields, our model simplifies inference and hence requires shorter training and prediction times along with lower memory capacity. Compared to usual pipeline approaches, our model passes over a complex intermediate problem, while making a more extensive usage of sophisticated joint features between text entities. Our method focuses on the core event extraction of the Genia task of BioNLP challenges yielding the best result reported so far on the 2013 edition.

摘要

背景

每天都会产生大量的电子生物医学文档，如分子生物学报告或基因组论文。如今，这些文档主要以非结构化自由文本的形式存在，要将其录入有组织的数据库需要进行大量处理。这种组织对于信息检索至关重要，能够回答生物学、医学及相关领域研究人员和从业者的高级查询。因此，海量数据流需要高效的自动文本挖掘方法，从生物医学文本中提取高级信息，如生物医学事件。由于该领域的特殊性，常用的自然语言处理计算工具无法直接用于提取这些生物医学事件。实际上，生物医学文档包含高度特定领域的行话和句法。这些文档还描述了独特的依存关系，使得分子生物学中的文本挖掘成为一门特定学科。

结果

我们将生物医学事件提取视为将文本实体对分类到对应事件类型的类别中。候选文本实体对被递归地提供给一个依赖支持向量机的多类分类器。这个递归过程提取以其他事件为论据的事件。与基于马尔可夫随机场的联合模型相比，我们的模型简化了推理，因此训练和预测时间更短，内存容量更低。与常用的流水线方法相比，我们的模型跳过了一个复杂的中间问题，同时更广泛地使用了文本实体之间复杂的联合特征。我们的方法专注于生物自然语言处理挑战中Genia任务的核心事件提取，在2013年版本中取得了迄今为止报告的最佳结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6d6b/4511465/c27ea6c9eb81/1471-2105-16-S10-S8-1.jpg

相似文献

Extracting biomedical events from pairs of text entities.

BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S8. doi: 10.1186/1471-2105-16-S10-S8. Epub 2015 Jul 13.

Development of an information retrieval tool for biomedical patents.

Comput Methods Programs Biomed. 2018 Jun;159:125-134. doi: 10.1016/j.cmpb.2018.03.012. Epub 2018 Mar 14.

Structured learning for spatial information extraction from biomedical text: bacteria biotopes.

BMC Bioinformatics. 2015 Apr 25;16:129. doi: 10.1186/s12859-015-0542-z.

A Novel Sample Selection Strategy for Imbalanced Data of Biomedical Event Extraction with Joint Scoring Mechanism.

Comput Math Methods Med. 2016;2016:7536494. doi: 10.1155/2016/7536494. Epub 2016 Dec 14.

Event extraction with complex event classification using rich features.

J Bioinform Comput Biol. 2010 Feb;8(1):131-46. doi: 10.1142/s0219720010004586.

Knowledge based word-concept model estimation and refinement for biomedical text mining.

J Biomed Inform. 2015 Feb;53:300-7. doi: 10.1016/j.jbi.2014.11.015. Epub 2014 Dec 12.

Automatic identification and classification of noun argument structures in biomedical literature.

IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1639-48. doi: 10.1109/TCBB.2012.111.

An Overview of Biomolecular Event Extraction from Scientific Documents.

Comput Math Methods Med. 2015;2015:571381. doi: 10.1155/2015/571381. Epub 2015 Oct 26.

A Relation Extraction Framework for Biomedical Text Using Hybrid Feature Set.

Comput Math Methods Med. 2015;2015:910423. doi: 10.1155/2015/910423. Epub 2015 Aug 10.

Extracting Inter-Sentence Relations for Associating Biological Context with Events in Biomedical Texts.

IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):1895-1906. doi: 10.1109/TCBB.2019.2904231. Epub 2020 Dec 8.

本文引用的文献

UniProt: a hub for protein information.

Nucleic Acids Res. 2015 Jan;43(Database issue):D204-12. doi: 10.1093/nar/gku989. Epub 2014 Oct 27.

The MIntAct project--IntAct as a common curation platform for 11 molecular interaction databases.

Nucleic Acids Res. 2014 Jan;42(Database issue):D358-63. doi: 10.1093/nar/gkt1115. Epub 2013 Nov 13.

Combining joint models for biomedical event extraction.

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S9. doi: 10.1186/1471-2105-13-S11-S9.

Semantically linking molecular entities in literature through entity relationships.

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S6. doi: 10.1186/1471-2105-13-S11-S6.

Biomedical event extraction from abstracts and full papers using search-based structured prediction.

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S5. doi: 10.1186/1471-2105-13-S11-S5.

University of Turku in the BioNLP'11 Shared Task.

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S4. doi: 10.1186/1471-2105-13-S11-S4.

Overview of the ID, EPI and REL tasks of BioNLP Shared Task 2011.

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S2. doi: 10.1186/1471-2105-13-S11-S2.

The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011.

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S1. doi: 10.1186/1471-2105-13-S11-S1.

Event extraction with complex event classification using rich features.

J Bioinform Comput Biol. 2010 Feb;8(1):131-46. doi: 10.1142/s0219720010004586.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从文本实体对中提取生物医学事件。

Extracting biomedical events from pairs of text entities.

作者信息

出版信息

BACKGROUND

RESULTS

背景

结果

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献