Department of Computer and Information Sciences, University of Delaware, 18 Amstel Ave, Newark, DE 19716, USA.
BMC Bioinformatics. 2014 Aug 23;15(1):285. doi: 10.1186/1471-2105-15-285.
Text mining is increasingly used in the biomedical domain because of its ability to automatically gather information from large amount of scientific articles. One important task in biomedical text mining is relation extraction, which aims to identify designated relations among biological entities reported in literature. A relation extraction system achieving high performance is expensive to develop because of the substantial time and effort required for its design and implementation. Here, we report a novel framework to facilitate the development of a pattern-based biomedical relation extraction system. It has several unique design features: (1) leveraging syntactic variations possible in a language and automatically generating extraction patterns in a systematic manner, (2) applying sentence simplification to improve the coverage of extraction patterns, and (3) identifying referential relations between a syntactic argument of a predicate and the actual target expected in the relation extraction task.
A relation extraction system derived using the proposed framework achieved overall F-scores of 72.66% for the Simple events and 55.57% for the Binding events on the BioNLP-ST 2011 GE test set, comparing favorably with the top performing systems that participated in the BioNLP-ST 2011 GE task. We obtained similar results on the BioNLP-ST 2013 GE test set (80.07% and 60.58%, respectively). We conducted additional experiments on the training and development sets to provide a more detailed analysis of the system and its individual modules. This analysis indicates that without increasing the number of patterns, simplification and referential relation linking play a key role in the effective extraction of biomedical relations.
In this paper, we present a novel framework for fast development of relation extraction systems. The framework requires only a list of triggers as input, and does not need information from an annotated corpus. Thus, we reduce the involvement of domain experts, who would otherwise have to provide manual annotations and help with the design of hand crafted patterns. We demonstrate how our framework is used to develop a system which achieves state-of-the-art performance on a public benchmark corpus.
由于能够从大量科学文章中自动收集信息,文本挖掘在生物医学领域的应用越来越广泛。生物医学文本挖掘的一个重要任务是关系提取,旨在识别文献中报道的生物实体之间指定的关系。开发高性能的关系提取系统需要大量的时间和精力,因此成本很高。在这里,我们报告了一种新的框架,以促进基于模式的生物医学关系提取系统的开发。它具有几个独特的设计特点:(1)利用语言中可能存在的语法变化,以系统的方式自动生成提取模式;(2)应用句子简化来提高提取模式的覆盖率;(3)识别谓词的语法参数与关系提取任务中实际目标之间的指称关系。
使用所提出的框架得出的关系提取系统在 BioNLP-ST 2011 GE 测试集上的简单事件的总体 F 分数为 72.66%,绑定事件的总体 F 分数为 55.57%,与参加 BioNLP-ST 2011 GE 任务的表现最佳的系统相比具有优势。我们在 BioNLP-ST 2013 GE 测试集上也获得了类似的结果(分别为 80.07%和 60.58%)。我们在训练集和开发集上进行了额外的实验,以更详细地分析系统及其各个模块。该分析表明,在不增加模式数量的情况下,简化和指称关系链接在有效提取生物医学关系方面起着关键作用。
在本文中,我们提出了一种用于快速开发关系提取系统的新框架。该框架仅需要一个触发器列表作为输入,并且不需要来自带注释语料库的信息。因此,我们减少了领域专家的参与,否则他们将不得不提供手动注释并帮助设计手工制作的模式。我们展示了如何使用我们的框架开发一种在公共基准语料库上达到最新性能的系统。