• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

无即否:一种嵌入推测语境的非恰当建模方法。

No means 'No': a non-improper modeling approach, with embedded speculative context.

机构信息

Saama AI Research Lab, Pune 411057, India.

出版信息

Bioinformatics. 2022 Oct 14;38(20):4790-4796. doi: 10.1093/bioinformatics/btac593.

DOI:10.1093/bioinformatics/btac593
PMID:36040145
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9563701/
Abstract

MOTIVATION

The medical data are complex in nature as terms that appear in records usually appear in different contexts. Through this article, we investigate various bio model's embeddings (BioBERT, BioELECTRA and PubMedBERT) on their understanding of 'negation and speculation context' wherein we found that these models were unable to differentiate 'negated context' versus 'non-negated context'. To measure the understanding of models, we used cosine similarity scores of negated sentence embeddings versus non-negated sentence embeddings pairs. For improving these models, we introduce a generic super tuning approach to enhance the embeddings on 'negation and speculation context' by utilizing a synthesized dataset.

RESULTS

After super-tuning the models, we can see that the model's embeddings are now understanding negative and speculative contexts much better. Furthermore, we fine-tuned the super-tuned models on various tasks and we found that the model has outperformed the previous models and achieved state-of-the-art on negation, speculation cue and scope detection tasks on BioScope abstracts and Sherlock dataset. We also confirmed that our approach had a very minimal trade-off in the performance of the model in other tasks like natural language inference after super-tuning.

AVAILABILITY AND IMPLEMENTATION

The source code, data and the models are available at: https://github.com/comprehend/engg-ai-research/tree/uncertainty-super-tuning.

摘要

动机

医学数据本质上较为复杂,因为记录中出现的术语通常出现在不同的上下文中。通过本文,我们研究了各种生物模型的嵌入(BioBERT、BioELECTRA 和 PubMedBERT)对“否定和推测语境”的理解,发现这些模型无法区分“被否定的语境”和“非被否定的语境”。为了衡量模型的理解能力,我们使用余弦相似性分数来比较否定句嵌入和非否定句嵌入对。为了改进这些模型,我们引入了一种通用的超调方法,通过利用合成数据集来增强对“否定和推测语境”的嵌入。

结果

在对模型进行超调后,我们可以看到模型的嵌入现在更好地理解了否定和推测的语境。此外,我们在各种任务上对超调后的模型进行了微调,发现该模型在否定、推测线索和范围检测任务上的表现优于之前的模型,在 BioScope 摘要和 Sherlock 数据集上达到了最新水平。我们还确认,在进行超调后,我们的方法对模型在其他任务(如自然语言推理)中的性能影响非常小。

可用性和实现

源代码、数据和模型可在:https://github.com/comprehend/engg-ai-research/tree/uncertainty-super-tuning 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/36184fef9e32/btac593f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/b9f61a3a88f0/btac593f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/80189cfdc61e/btac593f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/a12302895086/btac593f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/310a172e3c02/btac593f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/ecae411e60b2/btac593f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/1e6d5f51f5df/btac593f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/320e8f7914a2/btac593f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/afd6bf007b22/btac593f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/36184fef9e32/btac593f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/b9f61a3a88f0/btac593f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/80189cfdc61e/btac593f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/a12302895086/btac593f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/310a172e3c02/btac593f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/ecae411e60b2/btac593f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/1e6d5f51f5df/btac593f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/320e8f7914a2/btac593f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/afd6bf007b22/btac593f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d39f/9563701/36184fef9e32/btac593f9.jpg

相似文献

1
No means 'No': a non-improper modeling approach, with embedded speculative context.无即否:一种嵌入推测语境的非恰当建模方法。
Bioinformatics. 2022 Oct 14;38(20):4790-4796. doi: 10.1093/bioinformatics/btac593.
2
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
3
Evaluating sentence representations for biomedical text: Methods and experimental results.评价生物医学文本的句子表示方法及实验结果。
J Biomed Inform. 2020 Apr;104:103396. doi: 10.1016/j.jbi.2020.103396. Epub 2020 Mar 6.
4
The Impact of Pretrained Language Models on Negation and Speculation Detection in Cross-Lingual Medical Text: Comparative Study.预训练语言模型对跨语言医学文本中否定和推测检测的影响:比较研究
JMIR Med Inform. 2020 Dec 3;8(12):e18953. doi: 10.2196/18953.
5
Neural negated entity recognition in Spanish electronic health records.西班牙语电子健康记录中的神经否定实体识别。
J Biomed Inform. 2020 May;105:103419. doi: 10.1016/j.jbi.2020.103419. Epub 2020 Apr 13.
6
Biomedical negation scope detection with conditional random fields.基于条件随机场的生物医学否定范围检测。
J Am Med Inform Assoc. 2010 Nov-Dec;17(6):696-701. doi: 10.1136/jamia.2010.003228.
7
CODER: Knowledge-infused cross-lingual medical term embedding for term normalization.知识注入的跨语言医学术语嵌入用于术语归一化。
J Biomed Inform. 2022 Feb;126:103983. doi: 10.1016/j.jbi.2021.103983. Epub 2022 Jan 4.
8
CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain.CLIN-X:用于临床领域概念提取的预训练语言模型和跨任务迁移研究。
Bioinformatics. 2022 Jun 13;38(12):3267-3274. doi: 10.1093/bioinformatics/btac297.
9
Benchmarking Effectiveness and Efficiency of Deep Learning Models for Semantic Textual Similarity in the Clinical Domain: Validation Study.临床领域语义文本相似度深度学习模型的有效性和效率基准测试:验证研究
JMIR Med Inform. 2021 Dec 30;9(12):e27386. doi: 10.2196/27386.
10
Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。
J Am Med Inform Assoc. 2020 Oct 1;27(10):1538-1546. doi: 10.1093/jamia/ocaa136.

本文引用的文献

1
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
2
NegBio: a high-performance tool for negation and uncertainty detection in radiology reports.NegBio:一种用于放射学报告中否定和不确定性检测的高性能工具。
AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:188-196. eCollection 2018.
3
BIOSSES: a semantic sentence similarity estimation system for the biomedical domain.
BIOSSES:一种用于生物医学领域的语义句子相似度估计系统。
Bioinformatics. 2017 Jul 15;33(14):i49-i58. doi: 10.1093/bioinformatics/btx238.
4
Biomedical negation scope detection with conditional random fields.基于条件随机场的生物医学否定范围检测。
J Am Med Inform Assoc. 2010 Nov-Dec;17(6):696-701. doi: 10.1136/jamia.2010.003228.
5
A simple algorithm for identifying negated findings and diseases in discharge summaries.一种用于识别出院小结中否定性检查结果和疾病的简单算法。
J Biomed Inform. 2001 Oct;34(5):301-10. doi: 10.1006/jbin.2001.1029.
6
Use of general-purpose negation detection to augment concept indexing of medical documents: a quantitative study using the UMLS.使用通用否定检测增强医学文档的概念索引:一项使用统一医学语言系统的定量研究
J Am Med Inform Assoc. 2001 Nov-Dec;8(6):598-609. doi: 10.1136/jamia.2001.0080598.