生物语义范围语料库和基因事件语料库中基于语言范围和基于生物事件的推测与否定标注。

Linguistic scope-based and biological event-based speculation and negation annotations in the BioScope and Genia Event corpora.

作者信息

Vincze Veronika, Szarvas György, Móra György, Ohta Tomoko, Farkas Richárd

机构信息

Research Group on Artificial Intelligence, Hungarian Academy of Sciences, Szeged, Hungary.

出版信息

J Biomed Semantics. 2011 Oct 6;2 Suppl 5(Suppl 5):S8. doi: 10.1186/2041-1480-2-S5-S8.

DOI:10.1186/2041-1480-2-S5-S8

PMID:22166355

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3239308/

Abstract

BACKGROUND

The treatment of negation and hedging in natural language processing has received much interest recently, especially in the biomedical domain. However, open access corpora annotated for negation and/or speculation are hardly available for training and testing applications, and even if they are, they sometimes follow different design principles. In this paper, the annotation principles of the two largest corpora containing annotation for negation and speculation - BioScope and Genia Event - are compared. BioScope marks linguistic cues and their scopes for negation and hedging while in Genia biological events are marked for uncertainty and/or negation.

RESULTS

Differences among the annotations of the two corpora are thematically categorized and the frequency of each category is estimated. We found that the largest amount of differences is due to the issue that scopes - which cover text spans - deal with the key events and each argument (including events within events) of these events is under the scope as well. In contrast, Genia deals with the modality of events within events independently.

CONCLUSIONS

The analysis of multiple layers of annotation (linguistic scopes and biological events) showed that the detection of negation/hedge keywords and their scopes can contribute to determining the modality of key events (denoted by the main predicate). On the other hand, for the detection of the negation and speculation status of events within events, additional syntax-based rules investigating the dependency path between the modality cue and the event cue have to be employed.

摘要

背景

自然语言处理中对否定和模糊限制语的处理近来备受关注，尤其是在生物医学领域。然而，几乎没有可供训练和测试应用的带有否定和/或推测标注的开放获取语料库，即便有，它们有时也遵循不同的设计原则。本文比较了两个最大的带有否定和推测标注的语料库——BioScope和Genia事件——的标注原则。BioScope标注否定和模糊限制语的语言线索及其范围，而在Genia中，生物事件被标注为具有不确定性和/或否定性。

结果

对两个语料库标注之间的差异进行了主题分类，并估算了每个类别的频率。我们发现，最大数量的差异是由于范围（覆盖文本跨度）涉及关键事件且这些事件的每个论据（包括事件中的事件）也在范围内这一问题导致的。相比之下，Genia独立处理事件中的事件的模态。

结论

对多层标注（语言范围和生物事件）的分析表明，否定/模糊限制关键词及其范围的检测有助于确定关键事件（由主要谓词表示）的模态。另一方面，为了检测事件中的事件的否定和推测状态，必须采用基于句法的额外规则来研究模态线索和事件线索之间的依存路径。

相似文献

Linguistic scope-based and biological event-based speculation and negation annotations in the BioScope and Genia Event corpora.生物语义范围语料库和基因事件语料库中基于语言范围和基于生物事件的推测与否定标注。

J Biomed Semantics. 2011 Oct 6;2 Suppl 5(Suppl 5):S8. doi: 10.1186/2041-1480-2-S5-S8.

The Impact of Pretrained Language Models on Negation and Speculation Detection in Cross-Lingual Medical Text: Comparative Study.预训练语言模型对跨语言医学文本中否定和推测检测的影响：比较研究

JMIR Med Inform. 2020 Dec 3;8(12):e18953. doi: 10.2196/18953.

The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes.生物显微镜语料库：标注了不确定性、否定及其范围的生物医学文本。

BMC Bioinformatics. 2008 Nov 19;9 Suppl 11(Suppl 11):S9. doi: 10.1186/1471-2105-9-S11-S9.

Biomedical negation scope detection with conditional random fields.基于条件随机场的生物医学否定范围检测。

J Am Med Inform Assoc. 2010 Nov-Dec;17(6):696-701. doi: 10.1136/jamia.2010.003228.

Negated bio-events: analysis and identification.否定的生物事件：分析与识别。

BMC Bioinformatics. 2013 Jan 16;14:14. doi: 10.1186/1471-2105-14-14.

Negation and speculation processing: A study on cue-scope labelling and assertion classification in Spanish clinical text.否定和推测处理：西班牙语临床文本中线索范围标记和断言分类的研究。

Artif Intell Med. 2023 Nov;145:102682. doi: 10.1016/j.artmed.2023.102682. Epub 2023 Oct 10.

Detecting hedge cues and their scope in biomedical text with conditional random fields.用条件随机场检测生物医学文本中的 hedge 线索及其范围。

J Biomed Inform. 2010 Dec;43(6):953-61. doi: 10.1016/j.jbi.2010.08.003. Epub 2010 Aug 13.

Negation and uncertainty detection in clinical texts written in Spanish: a deep learning-based approach.西班牙语临床文本中的否定和不确定性检测：一种基于深度学习的方法。

PeerJ Comput Sci. 2022 Mar 7;8:e913. doi: 10.7717/peerj-cs.913. eCollection 2022.

Detecting negation and scope in Chinese clinical notes using character and word embedding.使用字符和词嵌入检测中文临床记录中的否定和范围

Comput Methods Programs Biomed. 2017 Mar;140:53-59. doi: 10.1016/j.cmpb.2016.11.009. Epub 2016 Nov 23.

A fast, accurate, and generalisable heuristic-based negation detection algorithm for clinical text.一种用于临床文本的快速、准确且可推广的基于启发式的否定检测算法。

Comput Biol Med. 2021 Mar;130:104216. doi: 10.1016/j.compbiomed.2021.104216. Epub 2021 Jan 16.

引用本文的文献

Qualifying Certainty in Radiology Reports through Deep Learning-Based Natural Language Processing.基于深度学习的自然语言处理在放射学报告中的定质研究。

AJNR Am J Neuroradiol. 2021 Oct;42(10):1755-1761. doi: 10.3174/ajnr.A7241. Epub 2021 Aug 19.

Learning relevance models for patient cohort retrieval.学习用于患者队列检索的相关性模型。

JAMIA Open. 2018 Oct;1(2):265-275. doi: 10.1093/jamiaopen/ooy010. Epub 2018 Sep 28.

Enhancing Extraction of Drug-Drug Interaction from Literature Using Neutral Candidates, Negation, and Clause Dependency.利用中性候选词、否定词和从句依存关系增强从文献中提取药物-药物相互作用信息的能力

PLoS One. 2016 Oct 3;11(10):e0163480. doi: 10.1371/journal.pone.0163480. eCollection 2016.

Negated bio-events: analysis and identification.否定的生物事件：分析与识别。

BMC Bioinformatics. 2013 Jan 16;14:14. doi: 10.1186/1471-2105-14-14.

Towards mature use of semantic resources for biomedical analyses.迈向生物医学分析语义资源的成熟应用。

J Biomed Semantics. 2011 Oct 6;2 Suppl 5(Suppl 5):I1. doi: 10.1186/2041-1480-2-S5-I1.

本文引用的文献

Recognizing obesity and comorbidities in sparse data.在稀疏数据中识别肥胖及合并症。

J Am Med Inform Assoc. 2009 Jul-Aug;16(4):561-70. doi: 10.1197/jamia.M3115. Epub 2009 Apr 23.

The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes.生物显微镜语料库：标注了不确定性、否定及其范围的生物医学文本。

BMC Bioinformatics. 2008 Nov 19;9 Suppl 11(Suppl 11):S9. doi: 10.1186/1471-2105-9-S11-S9.

Multi-dimensional classification of biomedical text: toward automated, practical provision of high-utility text to diverse users.生物医学文本的多维分类：致力于为不同用户自动提供实用价值高的文本。

Bioinformatics. 2008 Sep 15;24(18):2086-93. doi: 10.1093/bioinformatics/btn381. Epub 2008 Aug 20.

Corpus annotation for mining biomedical events from literature.用于从文献中挖掘生物医学事件的语料库标注。

BMC Bioinformatics. 2008 Jan 8;9:10. doi: 10.1186/1471-2105-9-10.

BioInfer: a corpus for information extraction in the biomedical domain.生物推理（BioInfer）：一个用于生物医学领域信息提取的语料库。

BMC Bioinformatics. 2007 Feb 9;8:50. doi: 10.1186/1471-2105-8-50.

A general natural-language text processor for clinical radiology.一种用于临床放射学的通用自然语言文本处理器。

J Am Med Inform Assoc. 1994 Mar-Apr;1(2):161-74. doi: 10.1136/jamia.1994.95236146.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。