National Centre for Text Mining, Manchester Interdisciplinary Biocentre, University of Manchester, 131 Princess Street, Manchester M1 7DN, UK.
BMC Bioinformatics. 2013 Jan 16;14:14. doi: 10.1186/1471-2105-14-14.
Negation occurs frequently in scientific literature, especially in biomedical literature. It has previously been reported that around 13% of sentences found in biomedical research articles contain negation. Historically, the main motivation for identifying negated events has been to ensure their exclusion from lists of extracted interactions. However, recently, there has been a growing interest in negative results, which has resulted in negation detection being identified as a key challenge in biomedical relation extraction. In this article, we focus on the problem of identifying negated bio-events, given gold standard event annotations.
We have conducted a detailed analysis of three open access bio-event corpora containing negation information (i.e., GENIA Event, BioInfer and BioNLP'09 ST), and have identified the main types of negated bio-events. We have analysed the key aspects of a machine learning solution to the problem of detecting negated events, including selection of negation cues, feature engineering and the choice of learning algorithm. Combining the best solutions for each aspect of the problem, we propose a novel framework for the identification of negated bio-events. We have evaluated our system on each of the three open access corpora mentioned above. The performance of the system significantly surpasses the best results previously reported on the BioNLP'09 ST corpus, and achieves even better results on the GENIA Event and BioInfer corpora, both of which contain more varied and complex events.
Recently, in the field of biomedical text mining, the development and enhancement of event-based systems has received significant interest. The ability to identify negated events is a key performance element for these systems. We have conducted the first detailed study on the analysis and identification of negated bio-events. Our proposed framework can be integrated with state-of-the-art event extraction systems. The resulting systems will be able to extract bio-events with attached polarities from textual documents, which can serve as the foundation for more elaborate systems that are able to detect mutually contradicting bio-events.
否定在科学文献中经常出现,尤其是在生物医学文献中。据报道,生物医学研究文章中约有 13%的句子包含否定。从历史上看,识别否定事件的主要动机是确保它们不会被提取的交互列表所包含。然而,最近,人们对负面结果的兴趣越来越大,这导致否定检测被确定为生物医学关系提取中的一个关键挑战。在本文中,我们专注于给定金标准事件标注识别否定生物事件的问题。
我们对包含否定信息的三个开放获取生物事件语料库(即 GENIA Event、BioInfer 和 BioNLP'09 ST)进行了详细分析,并确定了否定生物事件的主要类型。我们分析了机器学习解决方案检测否定事件的关键方面,包括否定线索的选择、特征工程和学习算法的选择。我们结合了该问题每个方面的最佳解决方案,提出了一种用于识别否定生物事件的新框架。我们在上述三个开放获取语料库中的每一个上都评估了我们的系统。该系统的性能明显超过了之前在 BioNLP'09 ST 语料库上报告的最佳结果,并且在 GENIA Event 和 BioInfer 语料库上的性能甚至更好,这两个语料库都包含了更多样化和复杂的事件。
最近,在生物医学文本挖掘领域,基于事件的系统的开发和增强受到了极大的关注。识别否定事件的能力是这些系统的关键性能要素。我们对否定生物事件的分析和识别进行了首次详细研究。我们提出的框架可以与最先进的事件提取系统集成。由此产生的系统将能够从文本文档中提取带有极性的生物事件,这可以作为能够检测相互矛盾的生物事件的更精细系统的基础。