SNOMED CT 在大语言模型中的应用：范围综述。

Republic of Korea Air Force Aerospace Medical Center, Cheongju, Republic of Korea.

Department of Nursing Science, Research Institute of Nursing Science, Chungbuk National University, Cheongju, Republic of Korea.

JMIR Med Inform. 2024 Oct 7;12:e62924. doi: 10.2196/62924.

BACKGROUND

Large language models (LLMs) have substantially advanced natural language processing (NLP) capabilities but often struggle with knowledge-driven tasks in specialized domains such as biomedicine. Integrating biomedical knowledge sources such as SNOMED CT into LLMs may enhance their performance on biomedical tasks. However, the methodologies and effectiveness of incorporating SNOMED CT into LLMs have not been systematically reviewed.

OBJECTIVE

This scoping review aims to examine how SNOMED CT is integrated into LLMs, focusing on (1) the types and components of LLMs being integrated with SNOMED CT, (2) which contents of SNOMED CT are being integrated, and (3) whether this integration improves LLM performance on NLP tasks.

METHODS

Following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines, we searched ACM Digital Library, ACL Anthology, IEEE Xplore, PubMed, and Embase for relevant studies published from 2018 to 2023. Studies were included if they incorporated SNOMED CT into LLM pipelines for natural language understanding or generation tasks. Data on LLM types, SNOMED CT integration methods, end tasks, and performance metrics were extracted and synthesized.

RESULTS

The review included 37 studies. Bidirectional Encoder Representations from Transformers and its biomedical variants were the most commonly used LLMs. Three main approaches for integrating SNOMED CT were identified: (1) incorporating SNOMED CT into LLM inputs (28/37, 76%), primarily using concept descriptions to expand training corpora; (2) integrating SNOMED CT into additional fusion modules (5/37, 14%); and (3) using SNOMED CT as an external knowledge retriever during inference (5/37, 14%). The most frequent end task was medical concept normalization (15/37, 41%), followed by entity extraction or typing and classification. While most studies (17/19, 89%) reported performance improvements after SNOMED CT integration, only a small fraction (19/37, 51%) provided direct comparisons. The reported gains varied widely across different metrics and tasks, ranging from 0.87% to 131.66%. However, some studies showed either no improvement or a decline in certain performance metrics.

CONCLUSIONS

This review demonstrates diverse approaches for integrating SNOMED CT into LLMs, with a focus on using concept descriptions to enhance biomedical language understanding and generation. While the results suggest potential benefits of SNOMED CT integration, the lack of standardized evaluation methods and comprehensive performance reporting hinders definitive conclusions about its effectiveness. Future research should prioritize consistent reporting of performance comparisons and explore more sophisticated methods for incorporating SNOMED CT's relational structure into LLMs. In addition, the biomedical NLP community should develop standardized evaluation frameworks to better assess the impact of ontology integration on LLM performance.

背景

大型语言模型（LLMs）在自然语言处理（NLP）方面取得了重大进展，但在生物医学等专业领域的知识驱动任务方面仍存在困难。将 SNOMED CT 等生物医学知识源整合到 LLM 中可能会提高它们在生物医学任务上的性能。然而，将 SNOMED CT 整合到 LLM 中的方法和效果尚未得到系统审查。

目的

本范围综述旨在考察 SNOMED CT 如何整合到 LLM 中，重点关注：（1）与 SNOMED CT 整合的 LLM 的类型和组成部分，（2）整合的 SNOMED CT 的内容，以及（3）这种整合是否提高了 LLM 在 NLP 任务上的性能。

方法

我们遵循 PRISMA-ScR（用于系统评价和荟萃分析扩展的首选报告项目）指南，从 2018 年至 2023 年在 ACM 数字图书馆、ACL 文集、IEEE Xplore、PubMed 和 Embase 中搜索了相关研究。如果研究将 SNOMED CT 整合到自然语言理解或生成任务的 LLM 管道中，则将其纳入研究。提取并综合了有关 LLM 类型、SNOMED CT 整合方法、最终任务和性能指标的数据。

结果

综述纳入了 37 项研究。双向编码器表示来自变压器及其生物医学变体是最常用的 LLM。确定了三种将 SNOMED CT 整合到 LLM 中的主要方法：（1）将 SNOMED CT 整合到 LLM 输入中（28/37，76%），主要使用概念描述来扩展训练语料库；（2）将 SNOMED CT 整合到附加融合模块中（5/37，14%）；（3）在推理过程中使用 SNOMED CT 作为外部知识检索（5/37，14%）。最常见的最终任务是医学概念规范化（15/37，41%），其次是实体提取或类型和分类。虽然大多数研究（17/19，89%）报告在整合 SNOMED CT 后性能有所提高，但只有一小部分（19/37，51%）提供了直接比较。报告的收益在不同的指标和任务上差异很大，范围从 0.87%到 131.66%。然而，一些研究表明，在某些性能指标上，要么没有提高，要么有所下降。

结论

本综述展示了将 SNOMED CT 整合到 LLM 中的多种方法，重点是使用概念描述来增强生物医学语言的理解和生成。虽然结果表明 SNOMED CT 整合具有潜在优势，但缺乏标准化的评估方法和全面的性能报告妨碍了对其有效性的明确结论。未来的研究应优先考虑一致报告性能比较，并探索更复杂的方法来将 SNOMED CT 的关系结构整合到 LLM 中。此外，生物医学 NLP 社区应开发标准化的评估框架，以更好地评估本体整合对 LLM 性能的影响。

相似文献

Use of SNOMED CT in Large Language Models: Scoping Review.

JMIR Med Inform. 2024 Oct 7;12:e62924. doi: 10.2196/62924.

Exploring the Credibility of Large Language Models for Mental Health Support: Protocol for a Scoping Review.

JMIR Res Protoc. 2025 Jan 29;14:e62865. doi: 10.2196/62865.

The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.

JMIR Med Inform. 2024 May 10;12:e53787. doi: 10.2196/53787.

Use of the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) for Processing Free Text in Health Care: Systematic Scoping Review.

J Med Internet Res. 2021 Jan 26;23(1):e24594. doi: 10.2196/24594.

Natural Language Processing for Digital Health in the Era of Large Language Models.

Yearb Med Inform. 2024 Aug;33(1):229-240. doi: 10.1055/s-0044-1800750. Epub 2025 Apr 8.

Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.

JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.

Large Language Models for Mental Health Applications: Systematic Review.

JMIR Ment Health. 2024 Oct 18;11:e57400. doi: 10.2196/57400.

Natural Language Processing Technologies for Public Health in Africa: Scoping Review.

J Med Internet Res. 2025 Mar 5;27:e68720. doi: 10.2196/68720.

Applications of Large Language Models in the Field of Suicide Prevention: Scoping Review.

J Med Internet Res. 2025 Jan 23;27:e63126. doi: 10.2196/63126.

Taiyi: a bilingual fine-tuned large language model for diverse biomedical tasks.

J Am Med Inform Assoc. 2024 Sep 1;31(9):1865-1874. doi: 10.1093/jamia/ocae037.

引用本文的文献

SHREC: A Framework for Advancing Next-Generation Computational Phenotyping with Large Language Models.

ArXiv. 2025 Jul 17:arXiv:2506.16359v3.

本文引用的文献

AnnoDash, a clinical terminology annotation dashboard.

JAMIA Open. 2023 Jul 8;6(3):ooad046. doi: 10.1093/jamiaopen/ooad046. eCollection 2023 Oct.

Automatic knowledge extraction from Chinese electronic medical records and rheumatoid arthritis knowledge graph construction.

Quant Imaging Med Surg. 2023 Jun 1;13(6):3873-3890. doi: 10.21037/qims-22-1158. Epub 2023 May 8.

Discharge summary hospital course summarisation of in patient Electronic Health Record text with clinical concept guided deep pre-trained Transformer models.

J Biomed Inform. 2023 May;141:104358. doi: 10.1016/j.jbi.2023.104358. Epub 2023 Apr 5.

Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data.

JMIR Form Res. 2023 Mar 7;7:e43014. doi: 10.2196/43014.

Stacking-BERT model for Chinese medical procedure entity normalization.

Math Biosci Eng. 2023 Jan;20(1):1018-1036. doi: 10.3934/mbe.2023047. Epub 2022 Oct 24.

BioGPT: generative pre-trained transformer for biomedical text generation and mining.

Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac409.

A simple neural vector space model for medical concept normalization using concept embeddings.

J Biomed Inform. 2022 Jun;130:104080. doi: 10.1016/j.jbi.2022.104080. Epub 2022 Apr 23.

Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.

J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.

Automatic SNOMED CT coding of Chinese clinical terms via attention-based semantic matching.

Int J Med Inform. 2022 Mar;159:104676. doi: 10.1016/j.ijmedinf.2021.104676. Epub 2021 Dec 28.

CODER: Knowledge-infused cross-lingual medical term embedding for term normalization.

J Biomed Inform. 2022 Feb;126:103983. doi: 10.1016/j.jbi.2021.103983. Epub 2022 Jan 4.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Use of SNOMED CT in Large Language Models: Scoping Review.

JMIR Med Inform. 2024 Oct 7;12:e62924. doi: 10.2196/62924.

Exploring the Credibility of Large Language Models for Mental Health Support: Protocol for a Scoping Review.

JMIR Res Protoc. 2025 Jan 29;14:e62865. doi: 10.2196/62865.

The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.

JMIR Med Inform. 2024 May 10;12:e53787. doi: 10.2196/53787.

Use of the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) for Processing Free Text in Health Care: Systematic Scoping Review.

J Med Internet Res. 2021 Jan 26;23(1):e24594. doi: 10.2196/24594.

Natural Language Processing for Digital Health in the Era of Large Language Models.

Yearb Med Inform. 2024 Aug;33(1):229-240. doi: 10.1055/s-0044-1800750. Epub 2025 Apr 8.

Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.

JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.

Large Language Models for Mental Health Applications: Systematic Review.

JMIR Ment Health. 2024 Oct 18;11:e57400. doi: 10.2196/57400.

Natural Language Processing Technologies for Public Health in Africa: Scoping Review.

J Med Internet Res. 2025 Mar 5;27:e68720. doi: 10.2196/68720.

Applications of Large Language Models in the Field of Suicide Prevention: Scoping Review.

J Med Internet Res. 2025 Jan 23;27:e63126. doi: 10.2196/63126.

Taiyi: a bilingual fine-tuned large language model for diverse biomedical tasks.

J Am Med Inform Assoc. 2024 Sep 1;31(9):1865-1874. doi: 10.1093/jamia/ocae037.

引用本文的文献

SHREC: A Framework for Advancing Next-Generation Computational Phenotyping with Large Language Models.

ArXiv. 2025 Jul 17:arXiv:2506.16359v3.

本文引用的文献

AnnoDash, a clinical terminology annotation dashboard.

JAMIA Open. 2023 Jul 8;6(3):ooad046. doi: 10.1093/jamiaopen/ooad046. eCollection 2023 Oct.

Automatic knowledge extraction from Chinese electronic medical records and rheumatoid arthritis knowledge graph construction.

Quant Imaging Med Surg. 2023 Jun 1;13(6):3873-3890. doi: 10.21037/qims-22-1158. Epub 2023 May 8.

Discharge summary hospital course summarisation of in patient Electronic Health Record text with clinical concept guided deep pre-trained Transformer models.

J Biomed Inform. 2023 May;141:104358. doi: 10.1016/j.jbi.2023.104358. Epub 2023 Apr 5.

Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data.

JMIR Form Res. 2023 Mar 7;7:e43014. doi: 10.2196/43014.

Stacking-BERT model for Chinese medical procedure entity normalization.

Math Biosci Eng. 2023 Jan;20(1):1018-1036. doi: 10.3934/mbe.2023047. Epub 2022 Oct 24.

BioGPT: generative pre-trained transformer for biomedical text generation and mining.

Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac409.

A simple neural vector space model for medical concept normalization using concept embeddings.

J Biomed Inform. 2022 Jun;130:104080. doi: 10.1016/j.jbi.2022.104080. Epub 2022 Apr 23.

Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.

J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.

Automatic SNOMED CT coding of Chinese clinical terms via attention-based semantic matching.

Int J Med Inform. 2022 Mar;159:104676. doi: 10.1016/j.ijmedinf.2021.104676. Epub 2021 Dec 28.

CODER: Knowledge-infused cross-lingual medical term embedding for term normalization.

J Biomed Inform. 2022 Feb;126:103983. doi: 10.1016/j.jbi.2021.103983. Epub 2022 Jan 4.

Use of SNOMED CT in Large Language Models: Scoping Review.

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献