• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SNOMED CT 在大语言模型中的应用:范围综述。

Use of SNOMED CT in Large Language Models: Scoping Review.

机构信息

Republic of Korea Air Force Aerospace Medical Center, Cheongju, Republic of Korea.

Department of Nursing Science, Research Institute of Nursing Science, Chungbuk National University, Cheongju, Republic of Korea.

出版信息

JMIR Med Inform. 2024 Oct 7;12:e62924. doi: 10.2196/62924.

DOI:10.2196/62924
PMID:39374057
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11494256/
Abstract

BACKGROUND

Large language models (LLMs) have substantially advanced natural language processing (NLP) capabilities but often struggle with knowledge-driven tasks in specialized domains such as biomedicine. Integrating biomedical knowledge sources such as SNOMED CT into LLMs may enhance their performance on biomedical tasks. However, the methodologies and effectiveness of incorporating SNOMED CT into LLMs have not been systematically reviewed.

OBJECTIVE

This scoping review aims to examine how SNOMED CT is integrated into LLMs, focusing on (1) the types and components of LLMs being integrated with SNOMED CT, (2) which contents of SNOMED CT are being integrated, and (3) whether this integration improves LLM performance on NLP tasks.

METHODS

Following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines, we searched ACM Digital Library, ACL Anthology, IEEE Xplore, PubMed, and Embase for relevant studies published from 2018 to 2023. Studies were included if they incorporated SNOMED CT into LLM pipelines for natural language understanding or generation tasks. Data on LLM types, SNOMED CT integration methods, end tasks, and performance metrics were extracted and synthesized.

RESULTS

The review included 37 studies. Bidirectional Encoder Representations from Transformers and its biomedical variants were the most commonly used LLMs. Three main approaches for integrating SNOMED CT were identified: (1) incorporating SNOMED CT into LLM inputs (28/37, 76%), primarily using concept descriptions to expand training corpora; (2) integrating SNOMED CT into additional fusion modules (5/37, 14%); and (3) using SNOMED CT as an external knowledge retriever during inference (5/37, 14%). The most frequent end task was medical concept normalization (15/37, 41%), followed by entity extraction or typing and classification. While most studies (17/19, 89%) reported performance improvements after SNOMED CT integration, only a small fraction (19/37, 51%) provided direct comparisons. The reported gains varied widely across different metrics and tasks, ranging from 0.87% to 131.66%. However, some studies showed either no improvement or a decline in certain performance metrics.

CONCLUSIONS

This review demonstrates diverse approaches for integrating SNOMED CT into LLMs, with a focus on using concept descriptions to enhance biomedical language understanding and generation. While the results suggest potential benefits of SNOMED CT integration, the lack of standardized evaluation methods and comprehensive performance reporting hinders definitive conclusions about its effectiveness. Future research should prioritize consistent reporting of performance comparisons and explore more sophisticated methods for incorporating SNOMED CT's relational structure into LLMs. In addition, the biomedical NLP community should develop standardized evaluation frameworks to better assess the impact of ontology integration on LLM performance.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a23/11494256/1fedc4b4d38f/medinform_v12i1e62924_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a23/11494256/d49c33663756/medinform_v12i1e62924_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a23/11494256/f815d9bedb9a/medinform_v12i1e62924_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a23/11494256/32a8d30d11c0/medinform_v12i1e62924_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a23/11494256/68899f7ea3f4/medinform_v12i1e62924_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a23/11494256/305b5e0e3dbc/medinform_v12i1e62924_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a23/11494256/1fedc4b4d38f/medinform_v12i1e62924_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a23/11494256/d49c33663756/medinform_v12i1e62924_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a23/11494256/f815d9bedb9a/medinform_v12i1e62924_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a23/11494256/32a8d30d11c0/medinform_v12i1e62924_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a23/11494256/68899f7ea3f4/medinform_v12i1e62924_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a23/11494256/305b5e0e3dbc/medinform_v12i1e62924_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a23/11494256/1fedc4b4d38f/medinform_v12i1e62924_fig6.jpg
摘要

背景

大型语言模型(LLMs)在自然语言处理(NLP)方面取得了重大进展,但在生物医学等专业领域的知识驱动任务方面仍存在困难。将 SNOMED CT 等生物医学知识源整合到 LLM 中可能会提高它们在生物医学任务上的性能。然而,将 SNOMED CT 整合到 LLM 中的方法和效果尚未得到系统审查。

目的

本范围综述旨在考察 SNOMED CT 如何整合到 LLM 中,重点关注:(1)与 SNOMED CT 整合的 LLM 的类型和组成部分,(2)整合的 SNOMED CT 的内容,以及(3)这种整合是否提高了 LLM 在 NLP 任务上的性能。

方法

我们遵循 PRISMA-ScR(用于系统评价和荟萃分析扩展的首选报告项目)指南,从 2018 年至 2023 年在 ACM 数字图书馆、ACL 文集、IEEE Xplore、PubMed 和 Embase 中搜索了相关研究。如果研究将 SNOMED CT 整合到自然语言理解或生成任务的 LLM 管道中,则将其纳入研究。提取并综合了有关 LLM 类型、SNOMED CT 整合方法、最终任务和性能指标的数据。

结果

综述纳入了 37 项研究。双向编码器表示来自变压器及其生物医学变体是最常用的 LLM。确定了三种将 SNOMED CT 整合到 LLM 中的主要方法:(1)将 SNOMED CT 整合到 LLM 输入中(28/37,76%),主要使用概念描述来扩展训练语料库;(2)将 SNOMED CT 整合到附加融合模块中(5/37,14%);(3)在推理过程中使用 SNOMED CT 作为外部知识检索(5/37,14%)。最常见的最终任务是医学概念规范化(15/37,41%),其次是实体提取或类型和分类。虽然大多数研究(17/19,89%)报告在整合 SNOMED CT 后性能有所提高,但只有一小部分(19/37,51%)提供了直接比较。报告的收益在不同的指标和任务上差异很大,范围从 0.87%到 131.66%。然而,一些研究表明,在某些性能指标上,要么没有提高,要么有所下降。

结论

本综述展示了将 SNOMED CT 整合到 LLM 中的多种方法,重点是使用概念描述来增强生物医学语言的理解和生成。虽然结果表明 SNOMED CT 整合具有潜在优势,但缺乏标准化的评估方法和全面的性能报告妨碍了对其有效性的明确结论。未来的研究应优先考虑一致报告性能比较,并探索更复杂的方法来将 SNOMED CT 的关系结构整合到 LLM 中。此外,生物医学 NLP 社区应开发标准化的评估框架,以更好地评估本体整合对 LLM 性能的影响。

相似文献

1
Use of SNOMED CT in Large Language Models: Scoping Review.SNOMED CT 在大语言模型中的应用:范围综述。
JMIR Med Inform. 2024 Oct 7;12:e62924. doi: 10.2196/62924.
2
Exploring the Credibility of Large Language Models for Mental Health Support: Protocol for a Scoping Review.探索用于心理健康支持的大语言模型的可信度:一项范围综述方案
JMIR Res Protoc. 2025 Jan 29;14:e62865. doi: 10.2196/62865.
3
The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.大型语言模型在变革急诊医学中的作用:范围综述
JMIR Med Inform. 2024 May 10;12:e53787. doi: 10.2196/53787.
4
Use of the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) for Processing Free Text in Health Care: Systematic Scoping Review.系统医学术语命名法(SNOMED CT)在医疗保健中处理自由文本的应用:系统范围综述。
J Med Internet Res. 2021 Jan 26;23(1):e24594. doi: 10.2196/24594.
5
Natural Language Processing for Digital Health in the Era of Large Language Models.大语言模型时代数字健康领域的自然语言处理
Yearb Med Inform. 2024 Aug;33(1):229-240. doi: 10.1055/s-0044-1800750. Epub 2025 Apr 8.
6
Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.用于肿瘤学健康信息提取的大语言模型应用:范围综述
JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.
7
Large Language Models for Mental Health Applications: Systematic Review.大型语言模型在精神健康应用中的应用:系统评价。
JMIR Ment Health. 2024 Oct 18;11:e57400. doi: 10.2196/57400.
8
Natural Language Processing Technologies for Public Health in Africa: Scoping Review.非洲公共卫生领域的自然语言处理技术:范围综述
J Med Internet Res. 2025 Mar 5;27:e68720. doi: 10.2196/68720.
9
Applications of Large Language Models in the Field of Suicide Prevention: Scoping Review.大语言模型在自杀预防领域的应用:范围综述
J Med Internet Res. 2025 Jan 23;27:e63126. doi: 10.2196/63126.
10
Taiyi: a bilingual fine-tuned large language model for diverse biomedical tasks.太乙:一个用于多种生物医学任务的双语精调大型语言模型。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1865-1874. doi: 10.1093/jamia/ocae037.

引用本文的文献

1
SHREC: A Framework for Advancing Next-Generation Computational Phenotyping with Large Language Models.SHREC:一个利用大语言模型推进下一代计算表型分析的框架。
ArXiv. 2025 Jul 17:arXiv:2506.16359v3.

本文引用的文献

1
AnnoDash, a clinical terminology annotation dashboard.AnnoDash,一个临床术语注释仪表板。
JAMIA Open. 2023 Jul 8;6(3):ooad046. doi: 10.1093/jamiaopen/ooad046. eCollection 2023 Oct.
2
Automatic knowledge extraction from Chinese electronic medical records and rheumatoid arthritis knowledge graph construction.从中国电子病历中自动提取知识并构建类风湿性关节炎知识图谱。
Quant Imaging Med Surg. 2023 Jun 1;13(6):3873-3890. doi: 10.21037/qims-22-1158. Epub 2023 May 8.
3
Discharge summary hospital course summarisation of in patient Electronic Health Record text with clinical concept guided deep pre-trained Transformer models.
出院小结 住院过程总结 电子健康记录中的文本 临床概念引导的深度预训练的 Transformer 模型
J Biomed Inform. 2023 May;141:104358. doi: 10.1016/j.jbi.2023.104358. Epub 2023 Apr 5.
4
Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data.使用自然语言处理方法从自由文本和非结构化患者生成的健康数据中提取医学信息:基于真实世界数据的可行性研究
JMIR Form Res. 2023 Mar 7;7:e43014. doi: 10.2196/43014.
5
Stacking-BERT model for Chinese medical procedure entity normalization.基于堆叠 BERT 的中文医疗过程实体标准化模型。
Math Biosci Eng. 2023 Jan;20(1):1018-1036. doi: 10.3934/mbe.2023047. Epub 2022 Oct 24.
6
BioGPT: generative pre-trained transformer for biomedical text generation and mining.BioGPT:用于生物医学文本生成和挖掘的生成式预训练转换器。
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac409.
7
A simple neural vector space model for medical concept normalization using concept embeddings.使用概念嵌入的医学概念规范化的简单神经向量空间模型。
J Biomed Inform. 2022 Jun;130:104080. doi: 10.1016/j.jbi.2022.104080. Epub 2022 Apr 23.
8
Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.利用基于深度学习的自然语言处理技术从非结构化电子健康记录中分类社会健康决定因素。
J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.
9
Automatic SNOMED CT coding of Chinese clinical terms via attention-based semantic matching.通过基于注意力的语义匹配对中文临床术语进行自动SNOMED CT编码。
Int J Med Inform. 2022 Mar;159:104676. doi: 10.1016/j.ijmedinf.2021.104676. Epub 2021 Dec 28.
10
CODER: Knowledge-infused cross-lingual medical term embedding for term normalization.知识注入的跨语言医学术语嵌入用于术语归一化。
J Biomed Inform. 2022 Feb;126:103983. doi: 10.1016/j.jbi.2021.103983. Epub 2022 Jan 4.