利用浅层语言分析检测和分类细菌栖息地

Detection and categorization of bacteria habitats using shallow linguistic analysis.

作者信息

Karadeniz İlknur, Özgür Arzucan

出版信息

BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S5. doi: 10.1186/1471-2105-16-S10-S5. Epub 2015 Jul 13.

DOI:10.1186/1471-2105-16-S10-S5

PMID:26201262

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4511461/

Abstract

BACKGROUND

Information regarding bacteria biotopes is important for several research areas including health sciences, microbiology, and food processing and preservation. One of the challenges for scientists in these domains is the huge amount of information buried in the text of electronic resources. Developing methods to automatically extract bacteria habitat relations from the text of these electronic resources is crucial for facilitating research in these areas.

METHODS

We introduce a linguistically motivated rule-based approach for recognizing and normalizing names of bacteria habitats in biomedical text by using an ontology. Our approach is based on the shallow syntactic analysis of the text that include sentence segmentation, part-of-speech (POS) tagging, partial parsing, and lemmatization. In addition, we propose two methods for identifying bacteria habitat localization relations. The underlying assumption for the first method is that discourse changes with a new paragraph. Therefore, it operates on a paragraph-basis. The second method performs a more fine-grained analysis of the text and operates on a sentence-basis. We also develop a novel anaphora resolution method for bacteria coreferences and incorporate it with the sentence-based relation extraction approach.

RESULTS

We participated in the Bacteria Biotope (BB) Task of the BioNLP Shared Task 2013. Our system (Boun) achieved the second best performance with 68% Slot Error Rate (SER) in Sub-task 1 (Entity Detection and Categorization), and ranked third with an F-score of 27% in Sub-task 2 (Localization Event Extraction). This paper reports the system that is implemented for the shared task, including the novel methods developed and the improvements obtained after the official evaluation. The extensions include the expansion of the OntoBiotope ontology using the training set for Sub-task 1, and the novel sentence-based relation extraction method incorporated with anaphora resolution for Sub-task 2. These extensions resulted in promising results for Sub-task 1 with a SER of 68%, and state-of-the-art performance for Sub-task 2 with an F-score of 53%.

CONCLUSIONS

Our results show that a linguistically-oriented approach based on the shallow syntactic analysis of the text is as effective as machine learning approaches for the detection and ontology-based normalization of habitat entities. Furthermore, the newly developed sentence-based relation extraction system with the anaphora resolution module significantly outperforms the paragraph-based one, as well as the other systems that participated in the BB Shared Task 2013.

摘要

背景

有关细菌生物栖息地的信息对于包括健康科学、微生物学以及食品加工与保存在内的多个研究领域都很重要。这些领域的科学家面临的挑战之一是电子资源文本中埋藏着海量信息。开发从这些电子资源文本中自动提取细菌栖息地关系的方法对于推动这些领域的研究至关重要。

方法

我们引入一种基于语言学动机的基于规则的方法，通过使用本体来识别和规范化生物医学文本中细菌栖息地的名称。我们的方法基于对文本的浅层句法分析，包括句子分割、词性（POS）标注、部分句法分析和词形还原。此外，我们提出了两种识别细菌栖息地定位关系的方法。第一种方法的基本假设是语篇会随着新段落而变化。因此，它以段落为基础进行操作。第二种方法对文本进行更细粒度的分析，并以句子为基础进行操作。我们还开发了一种用于细菌共指消解的新颖方法，并将其与基于句子的关系提取方法相结合。

结果

我们参加了2013年生物自然语言处理共享任务的细菌生物栖息地（BB）任务。我们的系统（Boun）在子任务1（实体检测与分类）中以68%的槽错误率（SER）取得了第二好的成绩，在子任务2（定位事件提取）中以27%的F值排名第三。本文报告了为共享任务实现的系统，包括开发的新颖方法以及官方评估后取得的改进。扩展内容包括使用子任务1的训练集扩展OntoBiotope本体，以及为子任务2将新颖的基于句子的关系提取方法与指代消解相结合。这些扩展在子任务1中取得了有前景的结果，SER为68%，在子任务2中取得了53%的F值的先进性能。

结论

我们的结果表明，基于文本浅层句法分析的面向语言学的方法在检测和基于本体的栖息地实体规范化方面与机器学习方法一样有效。此外，新开发的带有指代消解模块的基于句子的关系提取系统明显优于基于段落的系统以及参加2013年BB共享任务的其他系统。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee69/4511461/e5d721905282/1471-2105-16-S10-S5-1.jpg

相似文献

Detection and categorization of bacteria habitats using shallow linguistic analysis.利用浅层语言分析检测和分类细菌栖息地

BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S5. doi: 10.1186/1471-2105-16-S10-S5. Epub 2015 Jul 13.

Biomedical event extraction based on GRU integrating attention mechanism.基于 GRU 集成注意力机制的生物医学事件抽取。

BMC Bioinformatics. 2018 Aug 13;19(Suppl 9):285. doi: 10.1186/s12859-018-2275-2.

The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities.共指消解对细菌与生物栖息地实体之间监督关系检测的贡献。

BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S6. doi: 10.1186/1471-2105-16-S10-S6. Epub 2015 Jul 13.

Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach.细菌生境事件抽取：一种基于知识密集型自然语言处理的方法。

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S8. doi: 10.1186/1471-2105-13-S11-S8.

Unsupervised inference of implicit biomedical events using context triggers.使用上下文触发器进行无监督的隐含生物医学事件推断。

BMC Bioinformatics. 2020 Jan 28;21(1):29. doi: 10.1186/s12859-020-3341-0.

Overview of the gene regulation network and the bacteria biotope tasks in BioNLP'13 shared task.生物自然语言处理2013共享任务中的基因调控网络与细菌生态位任务概述。

BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S1. doi: 10.1186/1471-2105-16-S10-S1. Epub 2015 Jul 13.

Sortal anaphora resolution to enhance relation extraction from biomedical literature.用于增强从生物医学文献中提取关系的类别指代消解。

BMC Bioinformatics. 2016 Apr 14;17:163. doi: 10.1186/s12859-016-1009-6.

Linking entities through an ontology using word embeddings and syntactic re-ranking.通过使用词向量和句法重新排序将实体链接到本体中。

BMC Bioinformatics. 2019 Mar 27;20(1):156. doi: 10.1186/s12859-019-2678-8.

Structured learning for spatial information extraction from biomedical text: bacteria biotopes.从生物医学文本中提取空间信息的结构化学习：细菌生物栖息地

BMC Bioinformatics. 2015 Apr 25;16:129. doi: 10.1186/s12859-015-0542-z.

A span-based joint model for extracting entities and relations of bacteria biotopes.基于跨度的细菌生境实体和关系抽取联合模型。

Bioinformatics. 2021 Dec 22;38(1):220-227. doi: 10.1093/bioinformatics/btab593.

引用本文的文献

Linking entities through an ontology using word embeddings and syntactic re-ranking.通过使用词向量和句法重新排序将实体链接到本体中。

BMC Bioinformatics. 2019 Mar 27;20(1):156. doi: 10.1186/s12859-019-2678-8.

COPIOUS: A gold standard corpus of named entities towards extracting species occurrence from biodiversity literature.COPIOUS：一个用于从生物多样性文献中提取物种出现信息的命名实体黄金标准语料库。

Biodivers Data J. 2019 Jan 22(7):e29626. doi: 10.3897/BDJ.7.e29626. eCollection 2019.

Classification and analysis of a large collection of in vivo bioassay descriptions.大量体内生物测定描述的分类与分析

PLoS Comput Biol. 2017 Jul 5;13(7):e1005641. doi: 10.1371/journal.pcbi.1005641. eCollection 2017 Jul.

Bridging semantics and syntax with graph algorithms-state-of-the-art of extracting biomedical relations.用图算法弥合语义与句法——提取生物医学关系的研究现状

Brief Bioinform. 2017 Jan;18(1):160-178. doi: 10.1093/bib/bbw001. Epub 2016 Feb 5.

本文引用的文献

Overview of the gene regulation network and the bacteria biotope tasks in BioNLP'13 shared task.生物自然语言处理2013共享任务中的基因调控网络与细菌生态位任务概述。

BMC Bioinformatics. 2015;16 Suppl 10(Suppl 10):S1. doi: 10.1186/1471-2105-16-S10-S1. Epub 2015 Jul 13.

Event extraction of bacteria biotopes: a knowledge-intensive NLP-based approach.细菌生境事件抽取：一种基于知识密集型自然语言处理的方法。

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S8. doi: 10.1186/1471-2105-13-S11-S8.

University of Turku in the BioNLP'11 Shared Task.图尔库大学在 BioNLP'11 共享任务中的贡献。

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S4. doi: 10.1186/1471-2105-13-S11-S4.

BioNLP Shared Task--The Bacteria Track.生物自然语言处理共享任务——细菌专题。

BMC Bioinformatics. 2012 Jun 26;13 Suppl 11(Suppl 11):S3. doi: 10.1186/1471-2105-13-S11-S3.

AutoBind: automatic extraction of protein-ligand-binding affinity data from biological literature.AutoBind：从生物文献中自动提取蛋白质-配体结合亲和力数据。

Bioinformatics. 2012 Aug 15;28(16):2162-8. doi: 10.1093/bioinformatics/bts367. Epub 2012 Jul 2.

The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text.BioCreative III 的蛋白质-蛋白质相互作用任务：文章的分类/排序和将生物本体论概念链接到全文。

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S3. doi: 10.1186/1471-2105-12-S8-S3.

Detection of interaction articles and experimental methods in biomedical literature.生物医学文献中交互文章和实验方法的检测。

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S13. doi: 10.1186/1471-2105-12-S8-S13.

Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature.从生物医学文献中检测蛋白质-蛋白质相互作用的实验技术并选择相关文献。

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S11. doi: 10.1186/1471-2105-12-S8-S11.

Overview of the BioCreative III Workshop.第三届生物创意研讨会概述。

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2105-12-S8-S1.

A hybrid approach to extract protein-protein interactions.一种混合方法来提取蛋白质-蛋白质相互作用。

Bioinformatics. 2011 Jan 15;27(2):259-65. doi: 10.1093/bioinformatics/btq620. Epub 2010 Nov 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用浅层语言分析检测和分类细菌栖息地

Detection and categorization of bacteria habitats using shallow linguistic analysis.

作者信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献