医疗保健中的多模态大型语言模型：应用、挑战和未来展望。

Multimodal Large Language Models in Health Care: Applications, Challenges, and Future Outlook.

机构信息

Weill Cornell Medicine-Qatar, Education City, Doha, Qatar.

Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar.

出版信息

J Med Internet Res. 2024 Sep 25;26:e59505. doi: 10.2196/59505.

DOI:10.2196/59505

PMID:39321458

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11464944/

Abstract

In the complex and multidimensional field of medicine, multimodal data are prevalent and crucial for informed clinical decisions. Multimodal data span a broad spectrum of data types, including medical images (eg, MRI and CT scans), time-series data (eg, sensor data from wearable devices and electronic health records), audio recordings (eg, heart and respiratory sounds and patient interviews), text (eg, clinical notes and research articles), videos (eg, surgical procedures), and omics data (eg, genomics and proteomics). While advancements in large language models (LLMs) have enabled new applications for knowledge retrieval and processing in the medical field, most LLMs remain limited to processing unimodal data, typically text-based content, and often overlook the importance of integrating the diverse data modalities encountered in clinical practice. This paper aims to present a detailed, practical, and solution-oriented perspective on the use of multimodal LLMs (M-LLMs) in the medical field. Our investigation spanned M-LLM foundational principles, current and potential applications, technical and ethical challenges, and future research directions. By connecting these elements, we aimed to provide a comprehensive framework that links diverse aspects of M-LLMs, offering a unified vision for their future in health care. This approach aims to guide both future research and practical implementations of M-LLMs in health care, positioning them as a paradigm shift toward integrated, multimodal data-driven medical practice. We anticipate that this work will spark further discussion and inspire the development of innovative approaches in the next generation of medical M-LLM systems.

摘要

在复杂多维的医学领域，多模态数据普遍存在且对于明智的临床决策至关重要。多模态数据涵盖了广泛的数据类型，包括医学图像（例如 MRI 和 CT 扫描）、时间序列数据（例如可穿戴设备和电子健康记录中的传感器数据）、音频记录（例如心脏和呼吸声音以及患者访谈）、文本（例如临床记录和研究文章）、视频（例如手术过程）和组学数据（例如基因组学和蛋白质组学）。尽管大型语言模型 (LLM) 的进步使得医学领域的知识检索和处理有了新的应用，但大多数 LLM 仍然限于处理单模态数据，通常是基于文本的内容，并且经常忽略了整合临床实践中遇到的各种数据模态的重要性。本文旨在提供一个详细、实用且面向解决方案的视角，探讨多模态大型语言模型 (M-LLM) 在医学领域的应用。我们的调查涵盖了 M-LLM 的基础原理、当前和潜在的应用、技术和伦理挑战以及未来的研究方向。通过连接这些元素，我们旨在提供一个全面的框架，将 M-LLM 的各个方面联系起来，为它们在医疗保健中的未来提供一个统一的愿景。这种方法旨在指导医疗保健中 M-LLM 的未来研究和实际应用，将其定位为一种向集成的、多模态数据驱动的医疗实践转变的范例。我们预计，这项工作将引发进一步的讨论，并激发下一代医学 M-LLM 系统中创新方法的发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe6c/11464944/9d2ffc4a6dab/jmir_v26i1e59505_fig1.jpg

相似文献

Multimodal Large Language Models in Health Care: Applications, Challenges, and Future Outlook.医疗保健中的多模态大型语言模型：应用、挑战和未来展望。

J Med Internet Res. 2024 Sep 25;26:e59505. doi: 10.2196/59505.

The Impact of Multimodal Large Language Models on Health Care's Future.多模态大型语言模型对医疗保健未来的影响。

J Med Internet Res. 2023 Nov 2;25:e52865. doi: 10.2196/52865.

The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.大型语言模型在变革急诊医学中的作用：范围综述

JMIR Med Inform. 2024 May 10;12:e53787. doi: 10.2196/53787.

Artificial Intelligence in Dental Education: Opportunities and Challenges of Large Language Models and Multimodal Foundation Models.人工智能在牙科教育中的应用：大型语言模型和多模态基础模型的机遇与挑战。

JMIR Med Educ. 2024 Sep 27;10:e52346. doi: 10.2196/52346.

Potential of Large Language Models in Health Care: Delphi Study.大语言模型在医疗保健中的潜力：德尔菲研究。

J Med Internet Res. 2024 May 13;26:e52399. doi: 10.2196/52399.

The role of large language models in medical image processing: a narrative review.大语言模型在医学图像处理中的作用：一项叙述性综述。

Quant Imaging Med Surg. 2024 Jan 3;14(1):1108-1121. doi: 10.21037/qims-23-892. Epub 2023 Nov 23.

Large Language Models and User Trust: Consequence of Self-Referential Learning Loop and the Deskilling of Health Care Professionals.大语言模型与用户信任：自我参照学习循环的后果及医疗保健专业人员的技能退化

J Med Internet Res. 2024 Apr 25;26:e56764. doi: 10.2196/56764.

Generative Large Language Models in Electronic Health Records for Patient Care Since 2023: A Systematic Review.2023年以来电子健康记录中用于患者护理的生成式大语言模型：一项系统综述

medRxiv. 2024 Aug 19:2024.08.11.24311828. doi: 10.1101/2024.08.11.24311828.

Generative Artificial Intelligence Terminology: A Primer for Clinicians and Medical Researchers.生成式人工智能术语：临床医生和医学研究人员入门指南。

Cureus. 2023 Dec 4;15(12):e49890. doi: 10.7759/cureus.49890. eCollection 2023 Dec.

Large language models: a primer and gastroenterology applications.大语言模型：入门介绍及胃肠病学应用

Therap Adv Gastroenterol. 2024 Feb 22;17:17562848241227031. doi: 10.1177/17562848241227031. eCollection 2024.

引用本文的文献

Transformer-based multimodal precision intervention model for enhancing diaphragm function in elderly patients.基于Transformer的多模态精准干预模型用于增强老年患者膈肌功能

Front Comput Neurosci. 2025 Aug 18;19:1615576. doi: 10.3389/fncom.2025.1615576. eCollection 2025.

Performance and improvement strategies for adapting generative large language models for electronic health record applications: A systematic review.将生成式大语言模型应用于电子健康记录的性能及改进策略：一项系统综述

Int J Med Inform. 2025 Aug 28;205:106091. doi: 10.1016/j.ijmedinf.2025.106091.

Clinical Failure of General-Purpose AI in Photographic Scoliosis Assessment: A Diagnostic Accuracy Study.通用人工智能在脊柱侧弯摄影评估中的临床失败：一项诊断准确性研究。

Medicina (Kaunas). 2025 Jul 25;61(8):1342. doi: 10.3390/medicina61081342.

A bibliometric analysis of large language model-based AI chatbots in surgery.基于大语言模型的人工智能聊天机器人在外科手术中的文献计量分析

Ann Med Surg (Lond). 2025 May 12;87(7):4127-4138. doi: 10.1097/MS9.0000000000003234. eCollection 2025 Jul.

Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer.评估DeepSeek、Gemini、ChatGPT-4o和Perplexity对涎腺癌的回答。

BMC Oral Health. 2025 Aug 23;25(1):1358. doi: 10.1186/s12903-025-06726-4.

Leveraging and Harnessing Generative Artificial Intelligence to Mitigate the Burden of Neurodevelopmental Disorders (NDDs) in Children.利用和驾驭生成式人工智能减轻儿童神经发育障碍（NDDs）的负担。

Healthcare (Basel). 2025 Aug 4;13(15):1898. doi: 10.3390/healthcare13151898.

Combining Real and Synthetic Data to Overcome Limited Training Datasets in Multimodal Learning.结合真实数据与合成数据以克服多模态学习中有限的训练数据集

medRxiv. 2025 Jul 17:2025.07.16.25331662. doi: 10.1101/2025.07.16.25331662.

Implementation of generative AI for the assessment and treatment of autism spectrum disorders: a scoping review.生成式人工智能在自闭症谱系障碍评估与治疗中的应用：一项范围综述

Front Psychiatry. 2025 Jul 22;16:1628216. doi: 10.3389/fpsyt.2025.1628216. eCollection 2025.

Exploring the Role of Artificial Intelligence in Smart Healthcare: A Capability and Function-Oriented Review.探索人工智能在智能医疗中的作用：一项基于能力和功能的综述。

Healthcare (Basel). 2025 Jul 8;13(14):1642. doi: 10.3390/healthcare13141642.

Leveraging large language models for automated depression screening.利用大语言模型进行自动抑郁症筛查。

PLOS Digit Health. 2025 Jul 28;4(7):e0000943. doi: 10.1371/journal.pdig.0000943. eCollection 2025 Jul.

本文引用的文献

Multimodal Data Hybrid Fusion and Natural Language Processing for Clinical Prediction Models.用于临床预测模型的多模态数据混合融合与自然语言处理

AMIA Jt Summits Transl Sci Proc. 2024 May 31;2024:191-200. eCollection 2024.

Ethical and regulatory challenges of large language models in medicine.医学领域大型语言模型的伦理和监管挑战。

Lancet Digit Health. 2024 Jun;6(6):e428-e432. doi: 10.1016/S2589-7500(24)00061-X. Epub 2024 Apr 23.

Self-supervised learning on millions of primary RNA sequences from 72 vertebrates improves sequence-based RNA splicing prediction.对来自 72 种脊椎动物的数百万个原始 RNA 序列进行自监督学习，可提高基于序列的 RNA 剪接预测。

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae163.

A visual-language foundation model for computational pathology.用于计算病理学的视觉-语言基础模型。

Nat Med. 2024 Mar;30(3):863-874. doi: 10.1038/s41591-024-02856-4. Epub 2024 Mar 19.

Temporal self-attention for risk prediction from electronic health records using non-stationary kernel approximation.使用非平稳核逼近的电子健康记录风险预测的时间自注意力。

Artif Intell Med. 2024 Mar;149:102802. doi: 10.1016/j.artmed.2024.102802. Epub 2024 Feb 10.

Large language models improve annotation of prokaryotic viral proteins.大语言模型提高原核病毒蛋白的注释效果。

Nat Microbiol. 2024 Feb;9(2):537-549. doi: 10.1038/s41564-023-01584-8. Epub 2024 Jan 29.

Federated Learning Approach for Secured Medical Recommendation in Internet of Medical Things Using Homomorphic Encryption.基于同态加密的物联网中安全医疗推荐的联邦学习方法。

IEEE J Biomed Health Inform. 2024 Jun;28(6):3329-3340. doi: 10.1109/JBHI.2024.3350232. Epub 2024 Jun 6.

Evaluation of Vision LLMs GTP-4V and LLaVA for the Recognition of Features Characteristic of Melanoma.用于识别黑色素瘤特征的视觉语言模型GPT-4V和LLaVA的评估

J Cutan Med Surg. 2024 Jan-Feb;28(1):98-99. doi: 10.1177/12034754231220934. Epub 2024 Jan 4.

Model-Heterogeneous Semi-Supervised Federated Learning for Medical Image Segmentation.用于医学图像分割的模型异构半监督联邦学习

IEEE Trans Med Imaging. 2024 Jan 1;PP. doi: 10.1109/TMI.2023.3348982.

Comments on "ChatGPT and its Role in the Decision-Making for the Diagnosis and Treatment of Lumbar Spinal Stenosis: A Comparative Analysis and Narrative Review".对《ChatGPT及其在腰椎管狭窄症诊断与治疗决策中的作用：比较分析与叙述性综述》的评论

Global Spine J. 2024 May;14(4):1452. doi: 10.1177/21925682231222268. Epub 2023 Dec 13.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

医疗保健中的多模态大型语言模型：应用、挑战和未来展望。

Multimodal Large Language Models in Health Care: Applications, Challenges, and Future Outlook.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献