评估大型语言模型的性能：ChatGPT 和 Google Bard 在神经退行性疾病临床病理会议中生成鉴别诊断的能力。

Evaluating the performance of large language models: ChatGPT and Google Bard in generating differential diagnoses in clinicopathological conferences of neurodegenerative disorders.

机构信息

Department of Neuroscience, Mayo Clinic, Jacksonville, Florida, USA.

出版信息

Brain Pathol. 2024 May;34(3):e13207. doi: 10.1111/bpa.13207. Epub 2023 Aug 8.

DOI:10.1111/bpa.13207

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11006994/

Abstract

This study explores the utility of the large language models (LLMs), specifically ChatGPT and Google Bard, in predicting neuropathologic diagnoses from clinical summaries. A total of 25 cases of neurodegenerative disorders presented at Mayo Clinic brain bank Clinico-Pathological Conferences were analyzed. The LLMs provided multiple pathologic diagnoses and their rationales, which were compared with the final clinical diagnoses made by physicians. ChatGPT-3.5, ChatGPT-4, and Google Bard correctly made primary diagnoses in 32%, 52%, and 40% of cases, respectively, while correct diagnoses were included in 76%, 84%, and 76% of cases, respectively. These findings highlight the potential of artificial intelligence tools like ChatGPT in neuropathology, suggesting they may facilitate more comprehensive discussions in clinicopathological conferences.

摘要

这项研究探讨了大型语言模型（LLMs），特别是 ChatGPT 和 Google Bard，在从临床总结中预测神经病理诊断方面的效用。总共分析了 25 例在梅奥诊所脑库临床病理会议上呈现的神经退行性疾病病例。LLMs 提供了多种病理诊断及其理由，并与医生做出的最终临床诊断进行了比较。ChatGPT-3.5、ChatGPT-4 和 Google Bard 分别正确做出了 32%、52%和 40%的主要诊断，而正确诊断分别包含在 76%、84%和 76%的病例中。这些发现强调了像 ChatGPT 这样的人工智能工具在神经病理学中的潜力，表明它们可能有助于在临床病理会议上进行更全面的讨论。

相似文献

1

Evaluating the performance of large language models: ChatGPT and Google Bard in generating differential diagnoses in clinicopathological conferences of neurodegenerative disorders.评估大型语言模型的性能：ChatGPT 和 Google Bard 在神经退行性疾病临床病理会议中生成鉴别诊断的能力。

Brain Pathol. 2024 May;34(3):e13207. doi: 10.1111/bpa.13207. Epub 2023 Aug 8.

2

Examining the Role of Large Language Models in Orthopedics: Systematic Review.检查大型语言模型在骨科中的作用：系统评价。

J Med Internet Res. 2024 Nov 15;26:e59607. doi: 10.2196/59607.

3

Stench of Errors or the Shine of Potential: The Challenge of (Ir)Responsible Use of ChatGPT in Speech-Language Pathology.错误的恶臭还是潜力的光辉：言语病理学中（不）负责任地使用ChatGPT的挑战。

Int J Lang Commun Disord. 2025 Jul-Aug;60(4):e70088. doi: 10.1111/1460-6984.70088.

4

Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.ChatGPT-4o与四个开源大语言模型基于中国罕见病目录生成诊断的性能：比较研究

J Med Internet Res. 2025 Jun 18;27:e69929. doi: 10.2196/69929.

5

Evaluating the Use of ChatGPT 3.5 and Bard as Self-Assessment Tools for Short Answer Questions in Undergraduate Ophthalmology.评估ChatGPT 3.5和Bard作为本科眼科简答题自我评估工具的使用情况。

Cureus. 2025 Jun 18;17(6):e86288. doi: 10.7759/cureus.86288. eCollection 2025 Jun.

6

Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.ChatGPT与互联网搜索用于职业医学临床研究和决策的比较：随机对照试验

JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.

7

Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study.在医学视觉问答中评估Bard Gemini Pro和GPT-4 Vision对学生表现的影响：比较案例研究

JMIR Form Res. 2024 Dec 17;8:e57592. doi: 10.2196/57592.

8

Large Language Models and Empathy: Systematic Review.大语言模型与同理心：系统综述

J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.

9

Is Information About Musculoskeletal Malignancies From Large Language Models or Web Resources at a Suitable Reading Level for Patients?来自大语言模型或网络资源的关于肌肉骨骼恶性肿瘤的信息对患者来说是否处于合适的阅读水平？

Clin Orthop Relat Res. 2025 Feb 1;483(2):306-315. doi: 10.1097/CORR.0000000000003263. Epub 2024 Sep 25.

10

Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review.ChatGPT 及其他会话型大型语言模型在医疗保健中的应用及关注：系统评价。

J Med Internet Res. 2024 Nov 7;26:e22769. doi: 10.2196/22769.

引用本文的文献

1

Assessing the role of large language models in adolescent idiopathic scoliosis care: a comparison between ChatGPT and Google Gemini.评估大语言模型在青少年特发性脊柱侧弯护理中的作用：ChatGPT与谷歌Gemini的比较

Acta Orthop Traumatol Turc. 2025 Jul 18;59(4):222-229. doi: 10.5152/j.aott.2025.25279.

2

Clinical applications of large language models in medicine and surgery: A scoping review.大型语言模型在医学与外科中的临床应用：一项范围综述

J Int Med Res. 2025 Jul;53(7):3000605251347556. doi: 10.1177/03000605251347556. Epub 2025 Jul 4.

3

Harnessing AI for aphasia: a case report on ChatGPT's role in supporting written expression.利用人工智能治疗失语症：关于ChatGPT在支持书面表达方面作用的病例报告

Front Rehabil Sci. 2025 May 30;6:1600145. doi: 10.3389/fresc.2025.1600145. eCollection 2025.

4

Explainable Diagnosis Prediction through Neuro-Symbolic Integration.通过神经符号整合实现可解释的诊断预测。

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:332-341. eCollection 2025.

5

Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.医学诊断中的大语言模型：基于文献计量分析的综述

J Med Internet Res. 2025 Jun 9;27:e72062. doi: 10.2196/72062.

6

A large language model improves clinicians' diagnostic performance in complex critical illness cases.一个大语言模型提高了临床医生在复杂重症病例中的诊断表现。

Crit Care. 2025 Jun 6;29(1):230. doi: 10.1186/s13054-025-05468-7.

7

Enhancing the Accuracy of Human Phenotype Ontology Identification: Comparative Evaluation of Multimodal Large Language Models.提高人类表型本体识别的准确性：多模态大语言模型的比较评估

J Med Internet Res. 2025 Jun 2;27:e73233. doi: 10.2196/73233.

8

Chatbots' Role in Generating Single Best Answer Questions for Undergraduate Medical Student Assessment: Comparative Analysis.聊天机器人在生成本科医学生评估的单一最佳答案问题中的作用：比较分析

JMIR Med Educ. 2025 May 30;11:e69521. doi: 10.2196/69521.

9

Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.大型语言模型回答临床研究问题的准确性：系统评价与网络荟萃分析

J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.

10

A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians.生成式人工智能与医生诊断性能比较的系统评价与荟萃分析

NPJ Digit Med. 2025 Mar 22;8(1):175. doi: 10.1038/s41746-025-01543-z.

本文引用的文献

1

Brain Bank Questionnaire Helps in Differential Diagnosis of Movement Disorders: An Autopsy Study of 150 Patients.脑库调查问卷有助于运动障碍的鉴别诊断：150例患者的尸检研究

Mov Disord Clin Pract. 2023 May 29;10(7):1131-1135. doi: 10.1002/mdc3.13788. eCollection 2023 Jul.

2

The Potential of ChatGPT in Medical Education: Focusing on USMLE Preparation.ChatGPT在医学教育中的潜力：以美国医师执照考试准备为重点

Ann Biomed Eng. 2023 Oct;51(10):2123-2124. doi: 10.1007/s10439-023-03253-7. Epub 2023 May 29.

3

Passing is Great: Can ChatGPT Conduct USMLE Exams?及格很棒：ChatGPT能进行美国医师执照考试吗？

Ann Biomed Eng. 2023 Sep;51(9):1885-1886. doi: 10.1007/s10439-023-03224-y. Epub 2023 May 8.

4

Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.比较医生和人工智能聊天机器人对发布在公共社交媒体论坛上的患者问题的回复。

JAMA Intern Med. 2023 Jun 1;183(6):589-596. doi: 10.1001/jamainternmed.2023.1838.

5

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现：使用大语言模型进行人工智能辅助医学教育的潜力。

PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.

6

Artificial intelligence-derived neurofibrillary tangle burden is associated with antemortem cognitive impairment.人工智能衍生的神经原纤维缠结负担与生前认知障碍有关。

Acta Neuropathol Commun. 2022 Oct 31;10(1):157. doi: 10.1186/s40478-022-01457-x.

7

The many faces of globular glial tauopathy: A clinical and imaging study.球形神经胶质纤维tau 病的多面性：一项临床和影像学研究。

Eur J Neurol. 2023 Feb;30(2):321-333. doi: 10.1111/ene.15603. Epub 2022 Nov 1.

8

Advances in Deep Neuropathological Phenotyping of Alzheimer Disease: Past, Present, and Future.阿尔茨海默病的深度学习神经病理学表型研究进展：过去、现在和未来。

J Neuropathol Exp Neurol. 2022 Jan 21;81(1):2-15. doi: 10.1093/jnen/nlab122.

9

Deep learning-based model for diagnosing Alzheimer's disease and tauopathies.基于深度学习的阿尔茨海默病和 tau 病诊断模型。

Neuropathol Appl Neurobiol. 2022 Feb;48(1):e12759. doi: 10.1111/nan.12759. Epub 2021 Aug 31.

10

The accuracy of diagnosis of parkinsonian syndromes in a specialist movement disorder service.专科运动障碍诊疗机构中帕金森综合征的诊断准确性

Brain. 2002 Apr;125(Pt 4):861-70. doi: 10.1093/brain/awf080.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验