临床医生的大语言模型指南：以幻觉为重点的总体视角

The Clinicians' Guide to Large Language Models: A General Perspective With a Focus on Hallucinations.

作者信息

Roustan Dimitri, Bastardot François

机构信息

Emergency Medicine Department, Cliniques Universitaires Saint-Luc, Brussels, Belgium.

Medical Directorate, Lausanne University Hospital, Lausanne, Switzerland.

出版信息

Interact J Med Res. 2025 Jan 28;14:e59823. doi: 10.2196/59823.

DOI:10.2196/59823

PMID:39874574

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11815294/

Abstract

Large language models (LLMs) are artificial intelligence tools that have the prospect of profoundly changing how we practice all aspects of medicine. Considering the incredible potential of LLMs in medicine and the interest of many health care stakeholders for implementation into routine practice, it is therefore essential that clinicians be aware of the basic risks associated with the use of these models. Namely, a significant risk associated with the use of LLMs is their potential to create hallucinations. Hallucinations (false information) generated by LLMs arise from a multitude of causes, including both factors related to the training dataset as well as their auto-regressive nature. The implications for clinical practice range from the generation of inaccurate diagnostic and therapeutic information to the reinforcement of flawed diagnostic reasoning pathways, as well as a lack of reliability if not used properly. To reduce this risk, we developed a general technical framework for approaching LLMs in general clinical practice, as well as for implementation on a larger institutional scale.

摘要

大语言模型（LLMs）是人工智能工具，有望深刻改变我们从事医学各方面工作的方式。鉴于大语言模型在医学领域的巨大潜力以及众多医疗保健利益相关者将其应用于日常实践的兴趣，临床医生了解与使用这些模型相关的基本风险至关重要。具体而言，使用大语言模型的一个重大风险是它们产生幻觉的可能性。大语言模型产生的幻觉（虚假信息）源于多种原因，包括与训练数据集相关的因素以及它们的自回归性质。对临床实践的影响范围从产生不准确的诊断和治疗信息到强化有缺陷的诊断推理途径，以及如果使用不当则缺乏可靠性。为降低这种风险，我们开发了一个通用技术框架，用于在一般临床实践中应用大语言模型，以及在更大的机构规模上实施。

相似文献

The Clinicians' Guide to Large Language Models: A General Perspective With a Focus on Hallucinations.

Interact J Med Res. 2025 Jan 28;14:e59823. doi: 10.2196/59823.

Utilizing large language models for gastroenterology research: a conceptual framework.

Therap Adv Gastroenterol. 2025 Apr 1;18:17562848251328577. doi: 10.1177/17562848251328577. eCollection 2025.

Potential of Large Language Models in Health Care: Delphi Study.

J Med Internet Res. 2024 May 13;26:e52399. doi: 10.2196/52399.

The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.

JMIR Med Inform. 2024 May 10;12:e53787. doi: 10.2196/53787.

Using Generative Artificial Intelligence in Health Economics and Outcomes Research: A Primer on Techniques and Breakthroughs.

Pharmacoecon Open. 2025 Apr 29. doi: 10.1007/s41669-025-00580-4.

Challenges and barriers of using large language models (LLM) such as ChatGPT for diagnostic medicine with a focus on digital pathology - a recent scoping review.

Diagn Pathol. 2024 Feb 27;19(1):43. doi: 10.1186/s13000-024-01464-7.

Large Language Models and User Trust: Consequence of Self-Referential Learning Loop and the Deskilling of Health Care Professionals.

J Med Internet Res. 2024 Apr 25;26:e56764. doi: 10.2196/56764.

Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions.

Cancers (Basel). 2024 Aug 12;16(16):2830. doi: 10.3390/cancers16162830.

AI in Home Care-Evaluation of Large Language Models for Future Training of Informal Caregivers: Observational Comparative Case Study.

J Med Internet Res. 2025 Apr 28;27:e70703. doi: 10.2196/70703.

Potential applications and implications of large language models in primary care.

Fam Med Community Health. 2024 Jan 30;12(Suppl 1):e002602. doi: 10.1136/fmch-2023-002602.

引用本文的文献

AI Agents in Clinical Medicine: A Systematic Review.

medRxiv. 2025 Aug 26:2025.08.22.25334232. doi: 10.1101/2025.08.22.25334232.

Advances in Periodontal Diagnostics: Application of MultiModal Language Models in Visual Interpretation of Panoramic Radiographs.

Diagnostics (Basel). 2025 Jul 23;15(15):1851. doi: 10.3390/diagnostics15151851.

Artificial Intelligence Large Language Models in Cardiology.

Rev Cardiovasc Med. 2025 Jul 8;26(7):39452. doi: 10.31083/RCM39452. eCollection 2025 Jul.

Can AI match emergency physicians in managing common emergency cases? A comparative performance evaluation.

BMC Emerg Med. 2025 Jul 31;25(1):142. doi: 10.1186/s12873-025-01303-y.

Synthetic Patient-Physician Conversations Simulated by Large Language Models: A Multi-Dimensional Evaluation.

Sensors (Basel). 2025 Jul 10;25(14):4305. doi: 10.3390/s25144305.

Benchmarking AI Chatbots for Maternal Lactation Support: A Cross-Platform Evaluation of Quality, Readability, and Clinical Accuracy.

Healthcare (Basel). 2025 Jul 20;13(14):1756. doi: 10.3390/healthcare13141756.

Large Language Model Synergy for Ensemble Learning in Medical Question Answering: Design and Evaluation Study.

J Med Internet Res. 2025 Jul 14;27:e70080. doi: 10.2196/70080.

本文引用的文献

A survey on multimodal large language models.

Natl Sci Rev. 2024 Nov 12;11(12):nwae403. doi: 10.1093/nsr/nwae403. eCollection 2024 Dec.

Detecting hallucinations in large language models using semantic entropy.

Nature. 2024 Jun;630(8017):625-630. doi: 10.1038/s41586-024-07421-0. Epub 2024 Jun 19.

Chat GPT as a Neuro-Score Calculator: Analysis of a Large Language Model's Performance on Various Neurological Exam Grading Scales.

World Neurosurg. 2023 Nov;179:e342-e347. doi: 10.1016/j.wneu.2023.08.088. Epub 2023 Aug 26.

Use of GPT-4 to Analyze Medical Records of Patients With Extensive Investigations and Delayed Diagnosis.

JAMA Netw Open. 2023 Aug 1;6(8):e2325000. doi: 10.1001/jamanetworkopen.2023.25000.

ChatGPT and Bard exhibit spontaneous citation fabrication during psychiatry literature search.

Psychiatry Res. 2023 Aug;326:115334. doi: 10.1016/j.psychres.2023.115334. Epub 2023 Jul 7.

ChatGPT in head and neck scientific writing: A precautionary anecdote.

Am J Otolaryngol. 2023 Nov-Dec;44(6):103980. doi: 10.1016/j.amjoto.2023.103980. Epub 2023 Jul 6.

Performance of ChatGPT on a Radiology Board-style Examination: Insights into Current Strengths and Limitations.

Radiology. 2023 Jun;307(5):e230582. doi: 10.1148/radiol.230582. Epub 2023 May 16.

ChatGPT and Ophthalmology: Exploring Its Potential with Discharge Summaries and Operative Notes.

Semin Ophthalmol. 2023 Jul;38(5):503-507. doi: 10.1080/08820538.2023.2209166. Epub 2023 May 3.

Ethics of large language models in medicine and medical research.

Lancet Digit Health. 2023 Jun;5(6):e333-e335. doi: 10.1016/S2589-7500(23)00083-3. Epub 2023 Apr 27.

Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.

JAMA Intern Med. 2023 Jun 1;183(6):589-596. doi: 10.1001/jamainternmed.2023.1838.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

临床医生的大语言模型指南：以幻觉为重点的总体视角

The Clinicians' Guide to Large Language Models: A General Perspective With a Focus on Hallucinations.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr超能文献

临床医生的大语言模型指南：以幻觉为重点的总体视角

The Clinicians' Guide to Large Language Models: A General Perspective With a Focus on Hallucinations.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr
超能文献