• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于无监督SapBERT的双编码器,用于使用SNOMED CT对临床叙述进行医学概念注释。

Unsupervised SapBERT-based bi-encoders for medical concept annotation of clinical narratives with SNOMED CT.

作者信息

Abdulnazar Akhila, Roller Roland, Schulz Stefan, Kreuzthaler Markus

机构信息

Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.

CBmed GmbH - Center for Biomarker Research in Medicine, Graz, Austria.

出版信息

Digit Health. 2024 Oct 21;10:20552076241288681. doi: 10.1177/20552076241288681. eCollection 2024 Jan-Dec.

DOI:10.1177/20552076241288681
PMID:39493636
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11531008/
Abstract

OBJECTIVE

Clinical narratives provide comprehensive patient information. Achieving interoperability involves mapping relevant details to standardized medical vocabularies. Typically, natural language processing divides this task into named entity recognition (NER) and medical concept normalization (MCN). State-of-the-art results require supervised setups with abundant training data. However, the limited availability of annotated data due to sensitivity and time constraints poses challenges. This study addressed the need for unsupervised medical concept annotation (MCA) to overcome these limitations and support the creation of annotated datasets.

METHOD

We use an unsupervised SapBERT-based bi-encoder model to analyze n-grams from narrative text and measure their similarity to SNOMED CT concepts. At the end, we apply a syntactical re-ranker. For evaluation, we use the semantic tags of SNOMED CT candidates to assess the NER phase and their concept IDs to assess the MCN phase. The approach is evaluated with both English and German narratives.

RESULT

Without training data, our unsupervised approach achieves an F1 score of 0.765 in English and 0.557 in German for MCN. Evaluation at the semantic tag level reveals that "disorder" has the highest F1 scores, 0.871 and 0.648 on English and German datasets. Furthermore, the MCA approach on the semantic tag "disorder" shows F1 scores of 0.839 and 0.696 in English and 0.685 and 0.437 in German for NER and MCN, respectively.

CONCLUSION

This unsupervised approach demonstrates potential for initial annotation (pre-labeling) in manual annotation tasks. While promising for certain semantic tags, challenges remain, including false positives, contextual errors, and variability of clinical language, requiring further fine-tuning.

摘要

目的

临床叙述提供了全面的患者信息。实现互操作性涉及将相关细节映射到标准化医学词汇表。通常,自然语言处理将此任务分为命名实体识别(NER)和医学概念规范化(MCN)。最先进的结果需要有大量训练数据的监督设置。然而,由于敏感性和时间限制,带注释数据的可用性有限带来了挑战。本研究满足了对无监督医学概念注释(MCA)的需求,以克服这些限制并支持带注释数据集的创建。

方法

我们使用基于无监督SapBERT的双编码器模型来分析叙述文本中的n元语法,并测量它们与SNOMED CT概念的相似度。最后,我们应用一个句法重排器。为了进行评估,我们使用SNOMED CT候选词的语义标签来评估NER阶段,使用它们的概念ID来评估MCN阶段。该方法在英语和德语叙述文本上进行了评估。

结果

在没有训练数据的情况下,我们的无监督方法在MCN方面,英语的F1分数为0.765,德语的F1分数为0.557。在语义标签级别进行评估时发现,“疾病”的F1分数最高,在英语和德语数据集上分别为0.871和0.648。此外,在语义标签“疾病”上的MCA方法在NER和MCN方面,英语的F1分数分别为0.839和0.696,德语的F1分数分别为0.685和0.437。

结论

这种无监督方法在手动注释任务的初始注释(预标记)方面显示出潜力。虽然对某些语义标签很有前景,但挑战仍然存在,包括误报、上下文错误和临床语言的变异性,需要进一步微调。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5d2/11531008/acc7a4c9bd63/10.1177_20552076241288681-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5d2/11531008/17fafd3054de/10.1177_20552076241288681-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5d2/11531008/acc7a4c9bd63/10.1177_20552076241288681-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5d2/11531008/17fafd3054de/10.1177_20552076241288681-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5d2/11531008/acc7a4c9bd63/10.1177_20552076241288681-fig2.jpg

相似文献

1
Unsupervised SapBERT-based bi-encoders for medical concept annotation of clinical narratives with SNOMED CT.基于无监督SapBERT的双编码器,用于使用SNOMED CT对临床叙述进行医学概念注释。
Digit Health. 2024 Oct 21;10:20552076241288681. doi: 10.1177/20552076241288681. eCollection 2024 Jan-Dec.
2
Quantitative analysis of manual annotation of clinical text samples.临床文本样本的人工标注定量分析。
Int J Med Inform. 2019 Mar;123:37-48. doi: 10.1016/j.ijmedinf.2018.12.011. Epub 2018 Dec 31.
3
MCN: A comprehensive corpus for medical concept normalization.MCN:用于医学概念规范化的综合语料库。
J Biomed Inform. 2019 Apr;92:103132. doi: 10.1016/j.jbi.2019.103132. Epub 2019 Feb 22.
4
Automatic Annotation of French Medical Narratives with SNOMED CT Concepts.使用SNOMED CT概念对法语医学叙述进行自动标注
Stud Health Technol Inform. 2018;247:710-714.
5
A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation.基于 Transformer 的双向编码器表示模型的精细调整在食品命名实体识别中的应用:算法开发与验证。
J Med Internet Res. 2021 Aug 9;23(8):e28229. doi: 10.2196/28229.
6
Challenges in clinical natural language processing for automated disorder normalization.临床自然语言处理中自动疾病标准化的挑战。
J Biomed Inform. 2015 Oct;57:28-37. doi: 10.1016/j.jbi.2015.07.010. Epub 2015 Jul 14.
7
SapBERT-Based Medical Concept Normalization Using SNOMED CT.基于 SapBERT 的使用 SNOMED CT 的医学概念归一化。
Stud Health Technol Inform. 2023 May 18;302:825-826. doi: 10.3233/SHTI230278.
8
PCEtoFHIR: Decomposition of Postcoordinated SNOMED CT Expressions for Storage as HL7 FHIR Resources.PCEtoFHIR:用于存储为 HL7 FHIR 资源的后协调 SNOMED CT 表达式的分解。
JMIR Med Inform. 2024 Sep 17;12:e57853. doi: 10.2196/57853.
9
Use of the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) for Processing Free Text in Health Care: Systematic Scoping Review.系统医学术语命名法(SNOMED CT)在医疗保健中处理自由文本的应用:系统范围综述。
J Med Internet Res. 2021 Jan 26;23(1):e24594. doi: 10.2196/24594.
10
Sample Size Considerations for Fine-Tuning Large Language Models for Named Entity Recognition Tasks: Methodological Study.用于命名实体识别任务的大语言模型微调的样本量考量:方法学研究
JMIR AI. 2024 May 16;3:e52095. doi: 10.2196/52095.

本文引用的文献

1
The future landscape of large language models in medicine.医学领域大语言模型的未来前景。
Commun Med (Lond). 2023 Oct 10;3(1):141. doi: 10.1038/s43856-023-00370-1.
2
Application of specialized word embeddings and named entity and attribute recognition to the problem of unsupervised automated clinical coding.将专业词嵌入以及命名实体与属性识别应用于无监督自动临床编码问题。
Comput Biol Med. 2023 Oct;165:107422. doi: 10.1016/j.compbiomed.2023.107422. Epub 2023 Aug 30.
3
Ethical Considerations of Using ChatGPT in Health Care.使用 ChatGPT 在医疗保健中的伦理考虑。
J Med Internet Res. 2023 Aug 11;25:e48009. doi: 10.2196/48009.
4
SapBERT-Based Medical Concept Normalization Using SNOMED CT.基于 SapBERT 的使用 SNOMED CT 的医学概念归一化。
Stud Health Technol Inform. 2023 May 18;302:825-826. doi: 10.3233/SHTI230278.
5
Natural language processing-driven state machines to extract social factors from unstructured clinical documentation.由自然语言处理驱动的状态机,用于从非结构化临床文档中提取社会因素。
JAMIA Open. 2023 Apr 18;6(2):ooad024. doi: 10.1093/jamiaopen/ooad024. eCollection 2023 Jul.
6
Linguistic and ontological challenges of multiple domains contributing to transformed health ecosystems.多个领域在促成变革性健康生态系统方面面临的语言和本体论挑战。
Front Med (Lausanne). 2023 Mar 15;10:1073313. doi: 10.3389/fmed.2023.1073313. eCollection 2023.
7
Biomedical named entity recognition with the combined feature attention and fully-shared multi-task learning.基于联合特征注意力和全共享多任务学习的生物医学命名实体识别。
BMC Bioinformatics. 2022 Nov 3;23(1):458. doi: 10.1186/s12859-022-04994-3.
8
BERN2: an advanced neural biomedical named entity recognition and normalization tool.BERN2:一种先进的神经生物医学命名实体识别和标准化工具。
Bioinformatics. 2022 Oct 14;38(20):4837-4839. doi: 10.1093/bioinformatics/btac598.
9
A simple neural vector space model for medical concept normalization using concept embeddings.使用概念嵌入的医学概念规范化的简单神经向量空间模型。
J Biomed Inform. 2022 Jun;130:104080. doi: 10.1016/j.jbi.2022.104080. Epub 2022 Apr 23.
10
Enhancing unsupervised medical entity linking with multi-instance learning.利用多实例学习增强无监督医学实体链接。
BMC Med Inform Decis Mak. 2021 Nov 16;21(Suppl 9):317. doi: 10.1186/s12911-021-01654-z.