无即否：一种嵌入推测语境的非恰当建模方法。

No means 'No': a non-improper modeling approach, with embedded speculative context.

机构信息

Saama AI Research Lab, Pune 411057, India.

出版信息

Bioinformatics. 2022 Oct 14;38(20):4790-4796. doi: 10.1093/bioinformatics/btac593.

DOI:10.1093/bioinformatics/btac593

PMID:36040145

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9563701/

Abstract

MOTIVATION

The medical data are complex in nature as terms that appear in records usually appear in different contexts. Through this article, we investigate various bio model's embeddings (BioBERT, BioELECTRA and PubMedBERT) on their understanding of 'negation and speculation context' wherein we found that these models were unable to differentiate 'negated context' versus 'non-negated context'. To measure the understanding of models, we used cosine similarity scores of negated sentence embeddings versus non-negated sentence embeddings pairs. For improving these models, we introduce a generic super tuning approach to enhance the embeddings on 'negation and speculation context' by utilizing a synthesized dataset.

RESULTS

After super-tuning the models, we can see that the model's embeddings are now understanding negative and speculative contexts much better. Furthermore, we fine-tuned the super-tuned models on various tasks and we found that the model has outperformed the previous models and achieved state-of-the-art on negation, speculation cue and scope detection tasks on BioScope abstracts and Sherlock dataset. We also confirmed that our approach had a very minimal trade-off in the performance of the model in other tasks like natural language inference after super-tuning.

AVAILABILITY AND IMPLEMENTATION

The source code, data and the models are available at: https://github.com/comprehend/engg-ai-research/tree/uncertainty-super-tuning.

摘要

动机

医学数据本质上较为复杂，因为记录中出现的术语通常出现在不同的上下文中。通过本文，我们研究了各种生物模型的嵌入（BioBERT、BioELECTRA 和 PubMedBERT）对“否定和推测语境”的理解，发现这些模型无法区分“被否定的语境”和“非被否定的语境”。为了衡量模型的理解能力，我们使用余弦相似性分数来比较否定句嵌入和非否定句嵌入对。为了改进这些模型，我们引入了一种通用的超调方法，通过利用合成数据集来增强对“否定和推测语境”的嵌入。

结果

在对模型进行超调后，我们可以看到模型的嵌入现在更好地理解了否定和推测的语境。此外，我们在各种任务上对超调后的模型进行了微调，发现该模型在否定、推测线索和范围检测任务上的表现优于之前的模型，在 BioScope 摘要和 Sherlock 数据集上达到了最新水平。我们还确认，在进行超调后，我们的方法对模型在其他任务（如自然语言推理）中的性能影响非常小。

可用性和实现

源代码、数据和模型可在：https://github.com/comprehend/engg-ai-research/tree/uncertainty-super-tuning 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/b9f61a3a88f0/btac593f1.jpg

相似文献

No means 'No': a non-improper modeling approach, with embedded speculative context.无即否：一种嵌入推测语境的非恰当建模方法。

Bioinformatics. 2022 Oct 14;38(20):4790-4796. doi: 10.1093/bioinformatics/btac593.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT：一种用于生物医学文本挖掘的预训练生物医学语言表示模型。

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

Evaluating sentence representations for biomedical text: Methods and experimental results.评价生物医学文本的句子表示方法及实验结果。

J Biomed Inform. 2020 Apr;104:103396. doi: 10.1016/j.jbi.2020.103396. Epub 2020 Mar 6.

The Impact of Pretrained Language Models on Negation and Speculation Detection in Cross-Lingual Medical Text: Comparative Study.预训练语言模型对跨语言医学文本中否定和推测检测的影响：比较研究

JMIR Med Inform. 2020 Dec 3;8(12):e18953. doi: 10.2196/18953.

Neural negated entity recognition in Spanish electronic health records.西班牙语电子健康记录中的神经否定实体识别。

J Biomed Inform. 2020 May;105:103419. doi: 10.1016/j.jbi.2020.103419. Epub 2020 Apr 13.

Biomedical negation scope detection with conditional random fields.基于条件随机场的生物医学否定范围检测。

J Am Med Inform Assoc. 2010 Nov-Dec;17(6):696-701. doi: 10.1136/jamia.2010.003228.

CODER: Knowledge-infused cross-lingual medical term embedding for term normalization.知识注入的跨语言医学术语嵌入用于术语归一化。

J Biomed Inform. 2022 Feb;126:103983. doi: 10.1016/j.jbi.2021.103983. Epub 2022 Jan 4.

CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain.CLIN-X：用于临床领域概念提取的预训练语言模型和跨任务迁移研究。

Bioinformatics. 2022 Jun 13;38(12):3267-3274. doi: 10.1093/bioinformatics/btac297.

Benchmarking Effectiveness and Efficiency of Deep Learning Models for Semantic Textual Similarity in the Clinical Domain: Validation Study.临床领域语义文本相似度深度学习模型的有效性和效率基准测试：验证研究

JMIR Med Inform. 2021 Dec 30;9(12):e27386. doi: 10.2196/27386.

Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。

J Am Med Inform Assoc. 2020 Oct 1;27(10):1538-1546. doi: 10.1093/jamia/ocaa136.

本文引用的文献

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT：一种用于生物医学文本挖掘的预训练生物医学语言表示模型。

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

NegBio: a high-performance tool for negation and uncertainty detection in radiology reports.NegBio：一种用于放射学报告中否定和不确定性检测的高性能工具。

AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:188-196. eCollection 2018.

BIOSSES: a semantic sentence similarity estimation system for the biomedical domain.BIOSSES：一种用于生物医学领域的语义句子相似度估计系统。

Bioinformatics. 2017 Jul 15;33(14):i49-i58. doi: 10.1093/bioinformatics/btx238.

Biomedical negation scope detection with conditional random fields.基于条件随机场的生物医学否定范围检测。

J Am Med Inform Assoc. 2010 Nov-Dec;17(6):696-701. doi: 10.1136/jamia.2010.003228.

A simple algorithm for identifying negated findings and diseases in discharge summaries.一种用于识别出院小结中否定性检查结果和疾病的简单算法。

J Biomed Inform. 2001 Oct;34(5):301-10. doi: 10.1006/jbin.2001.1029.

Use of general-purpose negation detection to augment concept indexing of medical documents: a quantitative study using the UMLS.使用通用否定检测增强医学文档的概念索引：一项使用统一医学语言系统的定量研究

J Am Med Inform Assoc. 2001 Nov-Dec;8(6):598-609. doi: 10.1136/jamia.2001.0080598.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

无即否：一种嵌入推测语境的非恰当建模方法。

No means 'No': a non-improper modeling approach, with embedded speculative context.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献