Suppr超能文献

无即否:一种嵌入推测语境的非恰当建模方法。

No means 'No': a non-improper modeling approach, with embedded speculative context.

机构信息

Saama AI Research Lab, Pune 411057, India.

出版信息

Bioinformatics. 2022 Oct 14;38(20):4790-4796. doi: 10.1093/bioinformatics/btac593.

Abstract

MOTIVATION

The medical data are complex in nature as terms that appear in records usually appear in different contexts. Through this article, we investigate various bio model's embeddings (BioBERT, BioELECTRA and PubMedBERT) on their understanding of 'negation and speculation context' wherein we found that these models were unable to differentiate 'negated context' versus 'non-negated context'. To measure the understanding of models, we used cosine similarity scores of negated sentence embeddings versus non-negated sentence embeddings pairs. For improving these models, we introduce a generic super tuning approach to enhance the embeddings on 'negation and speculation context' by utilizing a synthesized dataset.

RESULTS

After super-tuning the models, we can see that the model's embeddings are now understanding negative and speculative contexts much better. Furthermore, we fine-tuned the super-tuned models on various tasks and we found that the model has outperformed the previous models and achieved state-of-the-art on negation, speculation cue and scope detection tasks on BioScope abstracts and Sherlock dataset. We also confirmed that our approach had a very minimal trade-off in the performance of the model in other tasks like natural language inference after super-tuning.

AVAILABILITY AND IMPLEMENTATION

The source code, data and the models are available at: https://github.com/comprehend/engg-ai-research/tree/uncertainty-super-tuning.

摘要

动机

医学数据本质上较为复杂,因为记录中出现的术语通常出现在不同的上下文中。通过本文,我们研究了各种生物模型的嵌入(BioBERT、BioELECTRA 和 PubMedBERT)对“否定和推测语境”的理解,发现这些模型无法区分“被否定的语境”和“非被否定的语境”。为了衡量模型的理解能力,我们使用余弦相似性分数来比较否定句嵌入和非否定句嵌入对。为了改进这些模型,我们引入了一种通用的超调方法,通过利用合成数据集来增强对“否定和推测语境”的嵌入。

结果

在对模型进行超调后,我们可以看到模型的嵌入现在更好地理解了否定和推测的语境。此外,我们在各种任务上对超调后的模型进行了微调,发现该模型在否定、推测线索和范围检测任务上的表现优于之前的模型,在 BioScope 摘要和 Sherlock 数据集上达到了最新水平。我们还确认,在进行超调后,我们的方法对模型在其他任务(如自然语言推理)中的性能影响非常小。

可用性和实现

源代码、数据和模型可在:https://github.com/comprehend/engg-ai-research/tree/uncertainty-super-tuning 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/b9f61a3a88f0/btac593f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验