GPT-4在支持肾脏病学多项选择题中医生决策方面的表现。

GPT-4's performance in supporting physician decision-making in nephrology multiple-choice questions.

作者信息

Noda Ryunosuke, Tanabe Kenichiro, Ichikawa Daisuke, Shibagaki Yugo

机构信息

Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-ku, Kawasaki, Kanagawa, 216-8511, Japan.

Pathophysiology and Bioregulation, St. Marianna University School of Medicine, Kawasaki, Japan.

出版信息

Sci Rep. 2025 May 2;15(1):15439. doi: 10.1038/s41598-025-99774-3.

DOI:10.1038/s41598-025-99774-3

PMID:40316716

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12048615/

Abstract

Generative Pre-trained Transformer (GPT)-4, a versatile conversational artificial intelligence, has potential applications in medicine, but its ability to support physicians' decision-making remains unclear. We evaluated GPT-4's performance in assisting physicians with nephrology questions. Forty-five single-answer multiple-choice questions were extracted from the Core Curriculum in Nephrology articles published in the American Journal of Kidney Diseases from October 2021 to June 2023. Eight junior physicians without board certification and ten senior physicians with board certification answered these questions twice: first unaided, then with the opportunity to revise their answers based on GPT-4's outputs. GPT-4 correctly answered 77.8% of the questions. Before using GPT-4, junior physicians had a median (interquartile range) proportion of correct answers of 53.3% (48.3-53.3), senior physicians 65.6% (60.6-66.7). After GPT-4 support, the median proportion of correct answers significantly increased to 72.2% (68.3-76.1) for juniors and 75.6% (73.3-80.0) for seniors (p = 0.008, p = 0.004). The improvement was significantly higher for junior physicians (p = 0.017). However, Senior physicians showed a decreased proportion of correct answers in one of the clinical categories. GPT-4 significantly improved physicians' accuracy in nephrology, especially among less experienced physicians, but may have negative impacts in specific subfields. Careful consideration is required when using GPT-4 to support physicians' decision-making.

摘要

生成式预训练变换器（GPT）-4是一种通用的对话式人工智能，在医学领域具有潜在应用，但它支持医生决策的能力仍不明确。我们评估了GPT-4在协助医生解答肾脏病问题方面的表现。从2021年10月至2023年6月发表在美国《肾脏病杂志》上的肾脏病核心课程文章中提取了45道单项选择题。8名未获得委员会认证的初级医生和10名获得委员会认证的高级医生对这些问题回答了两次：第一次无辅助回答，然后有机会根据GPT-4的输出修改答案。GPT-4正确回答了77.8%的问题。在使用GPT-4之前，初级医生正确答案的中位数（四分位间距）比例为53.3%（48.3 - 53.3），高级医生为65.6%（60.6 - 66.7）。在GPT-4的支持下，初级医生正确答案的中位数比例显著提高到72.2%（68.3 - 76.1），高级医生提高到75.6%（73.3 - 80.0）（p = 0.008，p = 0.004）。初级医生的提高幅度显著更高（p = 0.017）。然而，高级医生在其中一个临床类别中的正确答案比例有所下降。GPT-4显著提高了医生在肾脏病方面的准确性，尤其是在经验较少的医生中，但在特定子领域可能有负面影响。在使用GPT-4支持医生决策时需要仔细考虑。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0ff6/12048615/cb86baec4fae/41598_2025_99774_Fig1_HTML.jpg

相似文献

GPT-4's performance in supporting physician decision-making in nephrology multiple-choice questions.GPT-4在支持肾脏病学多项选择题中医生决策方面的表现。

Sci Rep. 2025 May 2;15(1):15439. doi: 10.1038/s41598-025-99774-3.

The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.人工智能的快速发展：GPT-4 在骨科手术委员会问题上的表现。

Orthopedics. 2024 Mar-Apr;47(2):e85-e89. doi: 10.3928/01477447-20230922-05. Epub 2023 Sep 27.

Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal.ChatGPT 和 Bard 在肾病学委员会更新的自我评估问题中的表现。

Clin Exp Nephrol. 2024 May;28(5):465-469. doi: 10.1007/s10157-023-02451-w. Epub 2024 Feb 14.

What can GPT-4 do for Diagnosing Rare Eye Diseases? A Pilot Study.GPT-4在罕见眼病诊断中能发挥什么作用？一项初步研究。

Ophthalmol Ther. 2023 Dec;12(6):3395-3402. doi: 10.1007/s40123-023-00789-8. Epub 2023 Sep 1.

GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions.GPT-4人工智能模型在类似神经外科书面考试的问题上表现优于ChatGPT、医学生和神经外科住院医师。

World Neurosurg. 2023 Nov;179:e160-e165. doi: 10.1016/j.wneu.2023.08.042. Epub 2023 Aug 18.

Evaluating ChatGPT-4's Accuracy in Identifying Final Diagnoses Within Differential Diagnoses Compared With Those of Physicians: Experimental Study for Diagnostic Cases.评估ChatGPT-4在鉴别诊断中识别最终诊断的准确性与医生的准确性比较：诊断病例的实验研究

JMIR Form Res. 2024 Jun 26;8:e59267. doi: 10.2196/59267.

Assessing ChatGPT's Mastery of Bloom's Taxonomy Using Psychosomatic Medicine Exam Questions: Mixed-Methods Study.使用心身医学考试问题评估 ChatGPT 对布鲁姆教育目标分类法的掌握程度：混合方法研究。

J Med Internet Res. 2024 Jan 23;26:e52113. doi: 10.2196/52113.

Performance of Progressive Generations of GPT on an Exam Designed for Certifying Physicians as Certified Clinical Densitometrists.GPT 各代产品在专为认证医师为认证临床骨密度技师而设计的考试中的表现。

J Clin Densitom. 2024 Apr-Jun;27(2):101480. doi: 10.1016/j.jocd.2024.101480. Epub 2024 Feb 17.

A comparative analysis of GPT-3.5 and GPT-4.0 on a multiple-choice ophthalmology question bank: A study on artificial intelligence developments.基于多项选择题眼科题库对GPT-3.5和GPT-4.0的比较分析：一项关于人工智能发展的研究。

Rom J Ophthalmol. 2024 Oct-Dec;68(4):367-371. doi: 10.22336/rjo.2024.67.

Performance of ChatGPT on Nephrology Test Questions.ChatGPT 在肾病学试题上的表现。

Clin J Am Soc Nephrol. 2024 Jan 1;19(1):35-43. doi: 10.2215/CJN.0000000000000330. Epub 2023 Oct 18.

本文引用的文献

Toward expert-level medical question answering with large language models.迈向使用大语言模型实现专家级医学问答

Nat Med. 2025 Mar;31(3):943-950. doi: 10.1038/s41591-024-03423-7. Epub 2025 Jan 8.

The potential of ChatGPT in medicine: an example analysis of nephrology specialty exams in Poland.ChatGPT在医学领域的潜力：以波兰肾脏病专业考试为例的分析

Clin Kidney J. 2024 Jun 22;17(8):sfae193. doi: 10.1093/ckj/sfae193. eCollection 2024 Aug.

Medical Artificial Intelligence and Human Values.医学人工智能与人类价值观

N Engl J Med. 2024 May 30;390(20):1895-1904. doi: 10.1056/NEJMra2214183.

Human-AI interaction in skin cancer diagnosis: a systematic review and meta-analysis.皮肤癌诊断中的人机交互：系统评价与荟萃分析。

NPJ Digit Med. 2024 Apr 9;7(1):78. doi: 10.1038/s41746-024-01031-w.

Performance of GPT-4 Vision on kidney pathology exam questions.GPT-4 视觉模型在肾脏病理考题上的表现。

Am J Clin Pathol. 2024 Sep 3;162(3):220-226. doi: 10.1093/ajcp/aqae030.

Clinical Reasoning of a Generative Artificial Intelligence Model Compared With Physicians.生成式人工智能模型与医生的临床推理比较

JAMA Intern Med. 2024 May 1;184(5):581-583. doi: 10.1001/jamainternmed.2024.0295.

Integrating Retrieval-Augmented Generation with Large Language Models in Nephrology: Advancing Practical Applications.将检索增强生成与大型语言模型在肾脏病学中的整合：推进实际应用。

Medicina (Kaunas). 2024 Mar 8;60(3):445. doi: 10.3390/medicina60030445.

Adapted large language models can outperform medical experts in clinical text summarization.经过改编的大型语言模型在临床文本总结方面的表现优于医学专家。

Nat Med. 2024 Apr;30(4):1134-1142. doi: 10.1038/s41591-024-02855-5. Epub 2024 Feb 27.

Performance of ChatGPT and Bard in self-assessment questions for nephrology board renewal.ChatGPT 和 Bard 在肾病学委员会更新的自我评估问题中的表现。

Clin Exp Nephrol. 2024 May;28(5):465-469. doi: 10.1007/s10157-023-02451-w. Epub 2024 Feb 14.

Innovating Personalized Nephrology Care: Exploring the Potential Utilization of ChatGPT.创新个性化肾脏病护理：探索ChatGPT的潜在应用

J Pers Med. 2023 Dec 4;13(12):1681. doi: 10.3390/jpm13121681.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

GPT-4在支持肾脏病学多项选择题中医生决策方面的表现。

GPT-4's performance in supporting physician decision-making in nephrology multiple-choice questions.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献