Suppr超能文献

生物语义范围语料库和基因事件语料库中基于语言范围和基于生物事件的推测与否定标注。

Linguistic scope-based and biological event-based speculation and negation annotations in the BioScope and Genia Event corpora.

作者信息

Vincze Veronika, Szarvas György, Móra György, Ohta Tomoko, Farkas Richárd

机构信息

Research Group on Artificial Intelligence, Hungarian Academy of Sciences, Szeged, Hungary.

出版信息

J Biomed Semantics. 2011 Oct 6;2 Suppl 5(Suppl 5):S8. doi: 10.1186/2041-1480-2-S5-S8.

Abstract

BACKGROUND

The treatment of negation and hedging in natural language processing has received much interest recently, especially in the biomedical domain. However, open access corpora annotated for negation and/or speculation are hardly available for training and testing applications, and even if they are, they sometimes follow different design principles. In this paper, the annotation principles of the two largest corpora containing annotation for negation and speculation - BioScope and Genia Event - are compared. BioScope marks linguistic cues and their scopes for negation and hedging while in Genia biological events are marked for uncertainty and/or negation.

RESULTS

Differences among the annotations of the two corpora are thematically categorized and the frequency of each category is estimated. We found that the largest amount of differences is due to the issue that scopes - which cover text spans - deal with the key events and each argument (including events within events) of these events is under the scope as well. In contrast, Genia deals with the modality of events within events independently.

CONCLUSIONS

The analysis of multiple layers of annotation (linguistic scopes and biological events) showed that the detection of negation/hedge keywords and their scopes can contribute to determining the modality of key events (denoted by the main predicate). On the other hand, for the detection of the negation and speculation status of events within events, additional syntax-based rules investigating the dependency path between the modality cue and the event cue have to be employed.

摘要

背景

自然语言处理中对否定和模糊限制语的处理近来备受关注,尤其是在生物医学领域。然而,几乎没有可供训练和测试应用的带有否定和/或推测标注的开放获取语料库,即便有,它们有时也遵循不同的设计原则。本文比较了两个最大的带有否定和推测标注的语料库——BioScope和Genia事件——的标注原则。BioScope标注否定和模糊限制语的语言线索及其范围,而在Genia中,生物事件被标注为具有不确定性和/或否定性。

结果

对两个语料库标注之间的差异进行了主题分类,并估算了每个类别的频率。我们发现,最大数量的差异是由于范围(覆盖文本跨度)涉及关键事件且这些事件的每个论据(包括事件中的事件)也在范围内这一问题导致的。相比之下,Genia独立处理事件中的事件的模态。

结论

对多层标注(语言范围和生物事件)的分析表明,否定/模糊限制关键词及其范围的检测有助于确定关键事件(由主要谓词表示)的模态。另一方面,为了检测事件中的事件的否定和推测状态,必须采用基于句法的额外规则来研究模态线索和事件线索之间的依存路径。

相似文献

4
5
Negated bio-events: analysis and identification.否定的生物事件:分析与识别。
BMC Bioinformatics. 2013 Jan 16;14:14. doi: 10.1186/1471-2105-14-14.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验