ChatGPT 在肾病学试题上的表现。

Performance of ChatGPT on Nephrology Test Questions.

机构信息

Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, Minnesota.

出版信息

Clin J Am Soc Nephrol. 2024 Jan 1;19(1):35-43. doi: 10.2215/CJN.0000000000000330. Epub 2023 Oct 18.

DOI:10.2215/CJN.0000000000000330

PMID:37851468

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10843340/

Abstract

BACKGROUND

ChatGPT is a novel tool that allows people to engage in conversations with an advanced machine learning model. ChatGPT's performance in the US Medical Licensing Examination is comparable with a successful candidate's performance. However, its performance in the nephrology field remains undetermined. This study assessed ChatGPT's capabilities in answering nephrology test questions.

METHODS

Questions sourced from Nephrology Self-Assessment Program and Kidney Self-Assessment Program were used, each with multiple-choice single-answer questions. Questions containing visual elements were excluded. Each question bank was run twice using GPT-3.5 and GPT-4. Total accuracy rate, defined as the percentage of correct answers obtained by ChatGPT in either the first or second run, and the total concordance, defined as the percentage of identical answers provided by ChatGPT during both runs, regardless of their correctness, were used to assess its performance.

RESULTS

A comprehensive assessment was conducted on a set of 975 questions, comprising 508 questions from Nephrology Self-Assessment Program and 467 from Kidney Self-Assessment Program. GPT-3.5 resulted in a total accuracy rate of 51%. Notably, the employment of Nephrology Self-Assessment Program yielded a higher accuracy rate compared with Kidney Self-Assessment Program (58% versus 44%; P < 0.001). The total concordance rate across all questions was 78%, with correct answers exhibiting a higher concordance rate (84%) compared with incorrect answers (73%) ( P < 0.001). When examining various nephrology subfields, the total accuracy rates were relatively lower in electrolyte and acid-base disorder, glomerular disease, and kidney-related bone and stone disorders. The total accuracy rate of GPT-4's response was 74%, higher than GPT-3.5 ( P < 0.001) but remained below the passing threshold and average scores of nephrology examinees (77%).

CONCLUSIONS

ChatGPT exhibited limitations regarding accuracy and repeatability when addressing nephrology-related questions. Variations in performance were evident across various subfields.

摘要

背景

ChatGPT 是一种新颖的工具，可让人们与先进的机器学习模型进行对话。ChatGPT 在 USMLE 中的表现可与成功考生的表现相媲美。然而，其在肾脏病学领域的表现尚不确定。本研究评估了 ChatGPT 回答肾脏病学试题的能力。

方法

使用来源于肾脏病自我评估计划和肾脏自我评估计划的试题，每个题库均包含多项选择题单项答案。排除包含视觉元素的试题。每个题库均使用 GPT-3.5 和 GPT-4 运行两次。总准确率定义为 ChatGPT 在任一次运行中答对的试题百分比，总一致性定义为 ChatGPT 在两次运行中提供的相同答案的百分比，无论其是否正确，均用于评估其性能。

结果

对 975 道试题进行了全面评估，其中 508 道试题来源于肾脏病自我评估计划，467 道试题来源于肾脏自我评估计划。GPT-3.5 的总准确率为 51%。值得注意的是，与肾脏自我评估计划相比，肾脏病自我评估计划的准确率更高（58%对 44%；P<0.001）。所有试题的总一致性率为 78%，正确答案的一致性率（84%）高于错误答案（73%）（P<0.001）。在检查各个肾脏病学亚领域时，电解质和酸碱紊乱、肾小球疾病和与肾脏相关的骨骼和结石疾病的总准确率相对较低。GPT-4 回答的总准确率为 74%，高于 GPT-3.5（P<0.001），但仍低于及格线和肾脏病学考生的平均分（77%）。

结论

ChatGPT 在回答与肾脏病学相关的问题时，准确性和可重复性存在局限性。其在不同领域的表现存在差异。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/343a/10843340/9907816de79f/cjasn-19-035-g001.jpg

相似文献

Performance of ChatGPT on Nephrology Test Questions.

Clin J Am Soc Nephrol. 2024 Jan 1;19(1):35-43. doi: 10.2215/CJN.0000000000000330. Epub 2023 Oct 18.

Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.

J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.

Prescription of Controlled Substances: Benefits and Risks

Performance of ChatGPT-3.5 and GPT-4 in national licensing examinations for medicine, pharmacy, dentistry, and nursing: a systematic review and meta-analysis.

BMC Med Educ. 2024 Sep 16;24(1):1013. doi: 10.1186/s12909-024-05944-8.

Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study.

J Med Internet Res. 2025 Feb 7;27:e65146. doi: 10.2196/65146.

Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE).

J Surg Educ. 2024 Nov;81(11):1645-1649. doi: 10.1016/j.jsurg.2024.08.002. Epub 2024 Sep 14.

Assessing ChatGPT's Mastery of Bloom's Taxonomy Using Psychosomatic Medicine Exam Questions: Mixed-Methods Study.

J Med Internet Res. 2024 Jan 23;26:e52113. doi: 10.2196/52113.

Large language models (LLMs) in radiology exams for medical students: Performance and consequences.

Rofo. 2024 Nov 4. doi: 10.1055/a-2437-2067.

Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study.

JMIR Form Res. 2024 Dec 17;8:e57592. doi: 10.2196/57592.

The assessment of ChatGPT-4's performance compared to expert's consensus on chronic lateral ankle instability.

J Exp Orthop. 2025 Aug 5;12(3):e70393. doi: 10.1002/jeo2.70393. eCollection 2025 Jul.

引用本文的文献

Accuracy, Clarity, and Comprehensiveness of ChatGPT Outputs for Commonly Asked Questions About Living Kidney Donation.

Clin Transplant. 2025 Sep;39(9):e70303. doi: 10.1111/ctr.70303.

The application of problem-based learning (PBL) guided by ChatGPT in clinical education in the Department of Nephrology.

BMC Med Educ. 2025 Jul 14;25(1):1048. doi: 10.1186/s12909-025-07427-w.

ChatGPT performance in answering medical residency questions in nephrology: a pilot study in Brazil.

J Bras Nefrol. 2025 Oct-Dec;47(4):e20240254. doi: 10.1590/2175-8239-JBN-2024-0254en.

Evaluation of generative AI assistance in clinical nephrology: Assessing GPT-4, GPT-4o, Gemini 1.0 Ultra, and PaLM 2 in patient interaction and renal biopsy interpretation.

Digit Health. 2025 Jun 2;11:20552076251342067. doi: 10.1177/20552076251342067. eCollection 2025 Jan-Dec.

GPT-4's performance in supporting physician decision-making in nephrology multiple-choice questions.

Sci Rep. 2025 May 2;15(1):15439. doi: 10.1038/s41598-025-99774-3.

Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.

J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.

Digital transformation of nephrology POCUS education-Integrating a multiagent, artificial intelligence, and human collaboration-enhanced curriculum with expert feedback.

Digit Health. 2025 Mar 28;11:20552076251328807. doi: 10.1177/20552076251328807. eCollection 2025 Jan-Dec.

Advancing personalized medicine in digital health: The role of artificial intelligence in enhancing clinical interpretation of 24-h ambulatory blood pressure monitoring.

Digit Health. 2025 Mar 14;11:20552076251326014. doi: 10.1177/20552076251326014. eCollection 2025 Jan-Dec.

Digital pathology and artificial intelligence in renal cell carcinoma focusing on feature extraction: a literature review.

Front Oncol. 2025 Jan 24;15:1516264. doi: 10.3389/fonc.2025.1516264. eCollection 2025.

Evaluating AI performance in nephrology triage and subspecialty referrals.

Sci Rep. 2025 Jan 27;15(1):3455. doi: 10.1038/s41598-025-88074-5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

ChatGPT 在肾病学试题上的表现。

Performance of ChatGPT on Nephrology Test Questions.

机构信息

Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, Minnesota.

出版信息

Clin J Am Soc Nephrol. 2024 Jan 1;19(1):35-43. doi: 10.2215/CJN.0000000000000330. Epub 2023 Oct 18.

DOI:10.2215/CJN.0000000000000330

PMID:37851468

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10843340/

Abstract

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

ChatGPT exhibited limitations regarding accuracy and repeatability when addressing nephrology-related questions. Variations in performance were evident across various subfields.

摘要

背景

方法

结果

结论

ChatGPT 在回答与肾脏病学相关的问题时，准确性和可重复性存在局限性。其在不同领域的表现存在差异。

ChatGPT 在肾病学试题上的表现。

Performance of ChatGPT on Nephrology Test Questions.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

ChatGPT 在肾病学试题上的表现。

Performance of ChatGPT on Nephrology Test Questions.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献