ChatGPT 及其他会话型大型语言模型在医疗保健中的应用及关注：系统评价。

Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review.

机构信息

Department of Computer Science, Vanderbilt University, Nashville, TN, United States.

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States.

出版信息

J Med Internet Res. 2024 Nov 7;26:e22769. doi: 10.2196/22769.

DOI:10.2196/22769

PMID:39509695

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11582494/

Abstract

BACKGROUND

The launch of ChatGPT (OpenAI) in November 2022 attracted public attention and academic interest to large language models (LLMs), facilitating the emergence of many other innovative LLMs. These LLMs have been applied in various fields, including health care. Numerous studies have since been conducted regarding how to use state-of-the-art LLMs in health-related scenarios.

OBJECTIVE

This review aims to summarize applications of and concerns regarding conversational LLMs in health care and provide an agenda for future research in this field.

METHODS

We used PubMed, ACM, and the IEEE digital libraries as primary sources for this review. We followed the guidance of PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) to screen and select peer-reviewed research articles that (1) were related to health care applications and conversational LLMs and (2) were published before September 1, 2023, the date when we started paper collection. We investigated these papers and classified them according to their applications and concerns.

RESULTS

Our search initially identified 820 papers according to targeted keywords, out of which 65 (7.9%) papers met our criteria and were included in the review. The most popular conversational LLM was ChatGPT (60/65, 92% of papers), followed by Bard (Google LLC; 1/65, 2% of papers), LLaMA (Meta; 1/65, 2% of papers), and other LLMs (6/65, 9% papers). These papers were classified into four categories of applications: (1) summarization, (2) medical knowledge inquiry, (3) prediction (eg, diagnosis, treatment recommendation, and drug synergy), and (4) administration (eg, documentation and information collection), and four categories of concerns: (1) reliability (eg, training data quality, accuracy, interpretability, and consistency in responses), (2) bias, (3) privacy, and (4) public acceptability. There were 49 (75%) papers using LLMs for either summarization or medical knowledge inquiry, or both, and there are 58 (89%) papers expressing concerns about either reliability or bias, or both. We found that conversational LLMs exhibited promising results in summarization and providing general medical knowledge to patients with a relatively high accuracy. However, conversational LLMs such as ChatGPT are not always able to provide reliable answers to complex health-related tasks (eg, diagnosis) that require specialized domain expertise. While bias or privacy issues are often noted as concerns, no experiments in our reviewed papers thoughtfully examined how conversational LLMs lead to these issues in health care research.

CONCLUSIONS

Future studies should focus on improving the reliability of LLM applications in complex health-related tasks, as well as investigating the mechanisms of how LLM applications bring bias and privacy issues. Considering the vast accessibility of LLMs, legal, social, and technical efforts are all needed to address concerns about LLMs to promote, improve, and regularize the application of LLMs in health care.

摘要

背景

2022 年 11 月，ChatGPT（OpenAI）的发布引起了公众和学术界对大型语言模型（LLM）的关注，促成了许多其他创新的 LLM 的出现。这些 LLM 已应用于各个领域，包括医疗保健领域。此后，人们对如何在与健康相关的场景中使用最先进的 LLM 进行了大量研究。

目的

本综述旨在总结对话型 LLM 在医疗保健中的应用和关注点，并为该领域的未来研究制定议程。

方法

我们使用 PubMed、ACM 和 IEEE 数字图书馆作为主要资源进行了这项综述。我们遵循 PRISMA（系统评价和荟萃分析的首选报告项目）的指导，筛选和选择了与医疗保健应用和对话型 LLM 相关且于 2023 年 9 月 1 日（我们开始论文收集的日期）之前发表的同行评审研究文章。我们研究了这些论文，并根据它们的应用和关注点进行了分类。

结果

我们的搜索最初根据目标关键字确定了 820 篇论文，其中 65 篇（7.9%）符合我们的标准并包含在综述中。最受欢迎的对话型 LLM 是 ChatGPT（65 篇中的 60 篇，占 92%），其次是 Bard（Google LLC；65 篇中的 1 篇，占 2%）、LLaMA（Meta；65 篇中的 1 篇，占 2%）和其他 LLM（65 篇中的 6 篇，占 9%）。这些论文分为四类应用：（1）总结，（2）医学知识查询，（3）预测（如诊断、治疗建议和药物协同作用），和（4）管理（如文档和信息收集），以及四类关注点：（1）可靠性（如训练数据质量、准确性、可解释性和响应一致性），（2）偏差，（3）隐私，和（4）公众接受度。有 49 篇（75%）论文使用 LLM 进行总结或医学知识查询，或两者兼而有之，有 58 篇（89%）论文表达了对可靠性或偏差的担忧，或两者兼而有之。我们发现，对话型 LLM 在总结和向患者提供一般医学知识方面表现出了有前景的结果，且具有相对较高的准确性。然而，像 ChatGPT 这样的对话型 LLM 并不总是能够为需要专业领域知识的复杂健康相关任务（如诊断）提供可靠的答案。虽然偏差或隐私问题通常被认为是关注点，但我们综述中的研究论文并没有认真研究对话型 LLM 在医疗保健研究中如何导致这些问题。

结论

未来的研究应侧重于提高 LLM 在复杂健康相关任务中的可靠性，以及研究 LLM 应用程序如何带来偏差和隐私问题的机制。考虑到 LLM 的广泛可及性，需要法律、社会和技术方面的努力来解决对 LLM 的担忧，以促进、改进和规范 LLM 在医疗保健中的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d063/11582494/0cb51c3b0441/jmir_v26i1e22769_fig1.jpg

相似文献

Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review.

J Med Internet Res. 2024 Nov 7;26:e22769. doi: 10.2196/22769.

A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare.

medRxiv. 2024 Apr 27:2024.04.26.24306390. doi: 10.1101/2024.04.26.24306390.

Large Language Models and Empathy: Systematic Review.

J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.

Implementing Large Language Models in Health Care: Clinician-Focused Review With Interactive Guideline.

J Med Internet Res. 2025 Jul 11;27:e71916. doi: 10.2196/71916.

Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature.

JBI Database System Rev Implement Rep. 2016 Apr;14(4):96-137. doi: 10.11124/JBISRIR-2016-1843.

Stench of Errors or the Shine of Potential: The Challenge of (Ir)Responsible Use of ChatGPT in Speech-Language Pathology.

Int J Lang Commun Disord. 2025 Jul-Aug;60(4):e70088. doi: 10.1111/1460-6984.70088.

The quantity, quality and findings of network meta-analyses evaluating the effectiveness of GLP-1 RAs for weight loss: a scoping review.

Health Technol Assess. 2025 Jun 25:1-73. doi: 10.3310/SKHT8119.

Applications of Large Language Models in the Field of Suicide Prevention: Scoping Review.

J Med Internet Res. 2025 Jan 23;27:e63126. doi: 10.2196/63126.

Sexual Harassment and Prevention Training

Examining the Role of Large Language Models in Orthopedics: Systematic Review.

J Med Internet Res. 2024 Nov 15;26:e59607. doi: 10.2196/59607.

引用本文的文献

ChatGPT in Nursing: Applications, Advantages, and Challenges in Education, Research, and Clinical Practice.

Ann Biomed Eng. 2025 Sep 9. doi: 10.1007/s10439-025-03832-w.

Communication Errors in Human-Chatbot Interactions: A Case Study of ChatGPT Arabic Mental Health Support Inquiries.

Behav Sci (Basel). 2025 Aug 18;15(8):1119. doi: 10.3390/bs15081119.

A comprehensive qualitative analysis of patient dialogue summarization using large language models applied to noisy, informal, non-English real-world data.

Sci Rep. 2025 Aug 27;15(1):31660. doi: 10.1038/s41598-025-13560-9.

Large language models in clinical nutrition: an overview of its applications, capabilities, limitations, and potential future prospects.

Front Nutr. 2025 Aug 7;12:1635682. doi: 10.3389/fnut.2025.1635682. eCollection 2025.

Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer.

BMC Oral Health. 2025 Aug 23;25(1):1358. doi: 10.1186/s12903-025-06726-4.

Can LLMs effectively assist medical coding? Evaluating GPT performance on DRG and targeted clinical tasks.

BMC Med Inform Decis Mak. 2025 Aug 19;25(1):312. doi: 10.1186/s12911-025-03151-z.

Assessing ChatGPT's Educational Potential in Lung Cancer Radiotherapy From Clinician and Patient Perspectives: Content Quality and Readability Analysis.

JMIR Cancer. 2025 Aug 13;11:e69783. doi: 10.2196/69783.

Comparing artificial intelligence- vs clinician-authored summaries of simulated primary care electronic health records.

JAMIA Open. 2025 Jul 30;8(4):ooaf082. doi: 10.1093/jamiaopen/ooaf082. eCollection 2025 Aug.

Assessing LLMs on IDSA Practice Guidelines for the Diagnosis and Treatment of Native Vertebral Osteomyelitis: A Comparison Study.

J Clin Med. 2025 Jul 15;14(14):4996. doi: 10.3390/jcm14144996.

Application of Large Language Models in Stroke Rehabilitation Health Education: 2-Phase Study.

J Med Internet Res. 2025 Jul 22;27:e73226. doi: 10.2196/73226.

本文引用的文献

Assessing the research landscape and clinical utility of large language models: a scoping review.

BMC Med Inform Decis Mak. 2024 Mar 12;24(1):72. doi: 10.1186/s12911-024-02459-6.

Large language models and generative AI in telehealth: a responsible use lens.

J Am Med Inform Assoc. 2024 Sep 1;31(9):2125-2136. doi: 10.1093/jamia/ocae035.

Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study.

Lancet Digit Health. 2024 Jan;6(1):e12-e22. doi: 10.1016/S2589-7500(23)00225-X.

Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.

J Med Internet Res. 2023 Dec 28;25:e51580. doi: 10.2196/51580.

The impact and opportunities of large language models like ChatGPT in oral and maxillofacial surgery: a narrative review.

Int J Oral Maxillofac Surg. 2024 Jan;53(1):78-88. doi: 10.1016/j.ijom.2023.09.005. Epub 2023 Oct 3.

Ethical Considerations of Using ChatGPT in Health Care.

J Med Internet Res. 2023 Aug 11;25:e48009. doi: 10.2196/48009.

RISK-GPT: Using ChatGPT to construct a reliable risk factor database for all known diseases.

J Glob Health. 2023 Aug 4;13:03037. doi: 10.7189/jogh.13.03037.

Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study.

Turk J Emerg Med. 2023 Jun 26;23(3):156-161. doi: 10.4103/tjem.tjem_79_23. eCollection 2023 Jul-Sep.

Evaluating large language models on a highly-specialized topic, radiation oncology physics.

Front Oncol. 2023 Jul 17;13:1219326. doi: 10.3389/fonc.2023.1219326. eCollection 2023.

ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model.

Int J Oral Sci. 2023 Jul 28;15(1):29. doi: 10.1038/s41368-023-00239-y.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

ChatGPT 及其他会话型大型语言模型在医疗保健中的应用及关注：系统评价。

Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review.

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献