• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于大语言模型临床概念嵌入的迁移学习

Transfer Learning with Clinical Concept Embeddings from Large Language Models.

作者信息

Gao Yuhe, Bao Runxue, Ji Yuelyu, Sun Yiming, Song Chenxi, Ferraro Jeffrey P, Ye Ye

机构信息

University of Pittsburgh.

GE Healthcare.

出版信息

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:167-176. eCollection 2025.

PMID:40502269
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12150738/
Abstract

Knowledge exchange is crucial in healthcare, particularly when leveraging data from multiple clinical sites to address data scarcity, reduce costs, and enable timely interventions. Transfer learning can facilitate cross-site knowledge transfer, yet a significant challenge is the heterogeneity in clinical concepts across different sites. Recently, Large Language Models (LLMs) have shown significant potential in capturing the semantic meanings of clinical concepts and mitigating heterogeneity in biomedicine. This study analyzed electronic health records from two large healthcare systems to assess the impact of semantic embeddings from LLMs on local models, shared models, and transfer learning models. The results indicate that domain-specific LLMs, such as Med-BERT, consistently outperform in local and direct transfer scenarios, whereas generic models like OpenAI embeddings may need fine-tuning for optimal performance. This study emphasizes the importance of domain-specific embeddings and meticulous model tuning for effective knowledge transfer in healthcare. It remains essential to investigate the balance the balance between the complexity of downstream tasks, the size of training samples, and the extent of model tuning.

摘要

知识交流在医疗保健中至关重要,特别是在利用来自多个临床站点的数据来解决数据稀缺、降低成本并实现及时干预时。迁移学习可以促进跨站点知识转移,但一个重大挑战是不同站点临床概念的异质性。最近,大语言模型(LLMs)在捕捉临床概念的语义含义和缓解生物医学中的异质性方面显示出巨大潜力。本研究分析了来自两个大型医疗系统的电子健康记录,以评估大语言模型的语义嵌入对本地模型、共享模型和迁移学习模型的影响。结果表明,特定领域的大语言模型,如Med-BERT,在本地和直接迁移场景中始终表现出色,而像OpenAI嵌入这样的通用模型可能需要进行微调以实现最佳性能。本研究强调了特定领域嵌入和精心模型调整对于医疗保健中有效知识转移的重要性。研究下游任务的复杂性、训练样本的大小和模型调整的程度之间的平衡仍然至关重要。

相似文献

1
Transfer Learning with Clinical Concept Embeddings from Large Language Models.基于大语言模型临床概念嵌入的迁移学习
AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:167-176. eCollection 2025.
2
Psychometric Evaluation of Large Language Model Embeddings for Personality Trait Prediction.用于人格特质预测的大语言模型嵌入的心理测量评估
J Med Internet Res. 2025 Jul 8;27:e75347. doi: 10.2196/75347.
3
The first step is the hardest: pitfalls of representing and tokenizing temporal data for large language models.第一步是最困难的:为大型语言模型表示和标记时间数据的陷阱。
J Am Med Inform Assoc. 2024 Sep 1;31(9):2151-2158. doi: 10.1093/jamia/ocae090.
4
Fine-tuning medical language models for enhanced long-contextual understanding and domain expertise.微调医学语言模型以增强长上下文理解和领域专业知识。
Quant Imaging Med Surg. 2025 Jun 6;15(6):5450-5462. doi: 10.21037/qims-2024-2655. Epub 2025 Jun 3.
5
Stench of Errors or the Shine of Potential: The Challenge of (Ir)Responsible Use of ChatGPT in Speech-Language Pathology.错误的恶臭还是潜力的光辉:言语病理学中(不)负责任地使用ChatGPT的挑战。
Int J Lang Commun Disord. 2025 Jul-Aug;60(4):e70088. doi: 10.1111/1460-6984.70088.
6
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
7
Algorithmic Classification of Psychiatric Disorder-Related Spontaneous Communication Using Large Language Model Embeddings: Algorithm Development and Validation.使用大语言模型嵌入对精神障碍相关自发交流进行算法分类:算法开发与验证
JMIR AI. 2025 May 30;4:e67369. doi: 10.2196/67369.
8
Toward Cross-Hospital Deployment of Natural Language Processing Systems: Model Development and Validation of Fine-Tuned Large Language Models for Disease Name Recognition in Japanese.迈向自然语言处理系统的跨医院部署:用于日语疾病名称识别的微调大语言模型的模型开发与验证
JMIR Med Inform. 2025 Jul 8;13:e76773. doi: 10.2196/76773.
9
Fusing Domain Knowledge with a Fine-Tuned Large Language Model for Enhanced Molecular Property Prediction.将领域知识与微调的大语言模型融合以增强分子性质预测
J Chem Theory Comput. 2025 Jul 22;21(14):6743-6758. doi: 10.1021/acs.jctc.5c00605. Epub 2025 Jul 9.
10
A systematic review of speech, language and communication interventions for children with Down syndrome from 0 to 6 years.对0至6岁唐氏综合征儿童言语、语言和沟通干预措施的系统评价。
Int J Lang Commun Disord. 2022 Mar;57(2):441-463. doi: 10.1111/1460-6984.12699. Epub 2022 Feb 22.

本文引用的文献

1
Prediction of COVID-19 Patients' Emergency Room Revisit using Multi-Source Transfer Learning.基于多源迁移学习的新冠肺炎患者急诊复诊预测
Proc (IEEE Int Conf Healthc Inform). 2023 Jun;2023:138-144. doi: 10.1109/ICHI57859.2023.00028. Epub 2023 Dec 11.
2
Learning across diverse biomedical data modalities and cohorts: Challenges and opportunities for innovation.跨多种生物医学数据模式和队列的学习:创新面临的挑战与机遇
Patterns (N Y). 2024 Jan 17;5(2):100913. doi: 10.1016/j.patter.2023.100913. eCollection 2024 Feb 9.
3
A Review of Recent Work in Transfer Learning and Domain Adaptation for Natural Language Processing of Electronic Health Records.迁移学习和领域自适应在电子病历自然语言处理中的最新研究综述
Yearb Med Inform. 2021 Aug;30(1):239-244. doi: 10.1055/s-0041-1726522. Epub 2021 Sep 3.
4
Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.医学BERT:基于大规模结构化电子健康记录进行疾病预测的预训练上下文嵌入模型
NPJ Digit Med. 2021 May 20;4(1):86. doi: 10.1038/s41746-021-00455-y.
5
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
6
Clinical documentation variations and NLP system portability: a case study in asthma birth cohorts across institutions.临床文档差异与自然语言处理系统的可移植性:跨机构哮喘出生队列的案例研究
J Am Med Inform Assoc. 2018 Mar 1;25(3):353-359. doi: 10.1093/jamia/ocx138.
7
The Unified Medical Language System (UMLS): integrating biomedical terminology.统一医学语言系统(UMLS):整合生物医学术语。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70. doi: 10.1093/nar/gkh061.
8
Creating a text classifier to detect radiology reports describing mediastinal findings associated with inhalational anthrax and other disorders.创建一个文本分类器,以检测描述与吸入性炭疽及其他病症相关的纵隔检查结果的放射学报告。
J Am Med Inform Assoc. 2003 Sep-Oct;10(5):494-503. doi: 10.1197/jamia.M1330. Epub 2003 Jun 4.