评估人工智能大型语言模型 ChatGPT 对心脏成像问题的回答。

Evaluation of responses to cardiac imaging questions by the artificial intelligence large language model ChatGPT.

机构信息

College of Medicine, California Northstate University, 9700 W Taron Dr, Elk Grove, CA 95757, USA.

Department of Radiology, University of California, Davis Medical Center, 4860 Y St, Suite 3100, Sacramento, CA 95817, USA.

出版信息

Clin Imaging. 2024 Aug;112:110193. doi: 10.1016/j.clinimag.2024.110193. Epub 2024 May 23.

DOI:10.1016/j.clinimag.2024.110193

PMID:38820977

Abstract

PURPOSE

To assess ChatGPT's ability as a resource for educating patients on various aspects of cardiac imaging, including diagnosis, imaging modalities, indications, interpretation of radiology reports, and management.

METHODS

30 questions were posed to ChatGPT-3.5 and ChatGPT-4 three times in three separate chat sessions. Responses were scored as correct, incorrect, or clinically misleading categories by three observers-two board certified cardiologists and one board certified radiologist with cardiac imaging subspecialization. Consistency of responses across the three sessions was also evaluated. Final categorization was based on majority vote between at least two of the three observers.

RESULTS

ChatGPT-3.5 answered seventeen of twenty eight questions correctly (61 %) by majority vote. Twenty one of twenty eight questions were answered correctly (75 %) by ChatGPT-4 by majority vote. Majority vote for correctness was not achieved for two questions. Twenty six of thirty questions were answered consistently by ChatGPT-3.5 (87 %). Twenty nine of thirty questions were answered consistently by ChatGPT-4 (97 %). ChatGPT-3.5 had both consistent and correct responses to seventeen of twenty eight questions (61 %). ChatGPT-4 had both consistent and correct responses to twenty of twenty eight questions (71 %).

CONCLUSION

ChatGPT-4 had overall better performance than ChatGTP-3.5 when answering cardiac imaging questions with regard to correctness and consistency of responses. While both ChatGPT-3.5 and ChatGPT-4 answers over half of cardiac imaging questions correctly, inaccurate, clinically misleading and inconsistent responses suggest the need for further refinement before its application for educating patients about cardiac imaging.

摘要

目的

评估 ChatGPT 作为教育患者了解心脏成像各个方面的资源的能力，包括诊断、成像方式、适应证、放射学报告解读和管理。

方法

向 ChatGPT-3.5 和 ChatGPT-4 提出了 30 个问题，在三个单独的聊天会话中分别进行了三次提问。由三位观察者（两位心脏病学委员会认证医师和一位具有心脏成像专业知识的放射学委员会认证医师）对回答进行正确、错误或临床误导的分类。还评估了三次会话中回答的一致性。最终分类基于至少三位观察者中的两位的多数票。

结果

ChatGPT-3.5 通过多数票正确回答了 28 个问题中的 17 个（61%）。ChatGPT-4 通过多数票正确回答了 28 个问题中的 21 个（75%）。两个问题没有达成多数票的正确性。ChatGPT-3.5 对 30 个问题中的 26 个（87%）回答一致。ChatGPT-4 对 30 个问题中的 29 个（97%）回答一致。ChatGPT-3.5 对 28 个问题中的 17 个（61%）回答既一致又正确。ChatGPT-4 对 28 个问题中的 20 个（71%）回答既一致又正确。

结论

在回答心脏成像问题时，ChatGPT-4 的表现总体优于 ChatGPT-3.5，无论是在回答的正确性还是一致性方面。虽然 ChatGPT-3.5 和 ChatGPT-4 对一半以上的心脏成像问题的回答是正确的，但不准确、具有临床误导性和不一致的回答表明，在将其应用于教育患者了解心脏成像之前，需要进一步改进。

相似文献

Evaluation of responses to cardiac imaging questions by the artificial intelligence large language model ChatGPT.

Clin Imaging. 2024 Aug;112:110193. doi: 10.1016/j.clinimag.2024.110193. Epub 2024 May 23.

Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.

JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.

How AI Responds to Common Lung Cancer Questions: ChatGPT vs Google Bard.

Radiology. 2023 Jun;307(5):e230922. doi: 10.1148/radiol.230922.

Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.

Int J Nurs Stud. 2024 May;153:104717. doi: 10.1016/j.ijnurstu.2024.104717. Epub 2024 Feb 8.

Performance of ChatGPT on American Board of Surgery In-Training Examination Preparation Questions.

J Surg Res. 2024 Jul;299:329-335. doi: 10.1016/j.jss.2024.04.060. Epub 2024 May 23.

A Multidisciplinary Assessment of ChatGPT's Knowledge of Amyloidosis: Observational Study.

JMIR Cardio. 2024 Apr 19;8:e53421. doi: 10.2196/53421.

Evaluating ChatGPT's effectiveness and tendencies in Japanese internal medicine.

J Eval Clin Pract. 2024 Sep;30(6):1017-1023. doi: 10.1111/jep.14011. Epub 2024 May 19.

Performance of an Artificial Intelligence Chatbot in Ophthalmic Knowledge Assessment.

JAMA Ophthalmol. 2023 Jun 1;141(6):589-597. doi: 10.1001/jamaophthalmol.2023.1144.

ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.

Front Med (Lausanne). 2023 Dec 13;10:1296615. doi: 10.3389/fmed.2023.1296615. eCollection 2023.

Performance of ChatGPT as an AI-assisted decision support tool in medicine: a proof-of-concept study for interpreting symptoms and management of common cardiac conditions (AMSTELHEART-2).

Acta Cardiol. 2024 May;79(3):358-366. doi: 10.1080/00015385.2024.2303528. Epub 2024 Feb 13.

引用本文的文献

Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.

J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.

Evaluating AI proficiency in nuclear cardiology: Large language models take on the board preparation exam.

J Nucl Cardiol. 2025 Mar;45:102089. doi: 10.1016/j.nuclcard.2024.102089. Epub 2024 Nov 29.

Large language models in patient education: a scoping review of applications in medicine.

Front Med (Lausanne). 2024 Oct 29;11:1477898. doi: 10.3389/fmed.2024.1477898. eCollection 2024.

Evaluating AI Proficiency in Nuclear Cardiology: Large Language Models take on the Board Preparation Exam.

medRxiv. 2024 Jul 16:2024.07.16.24310297. doi: 10.1101/2024.07.16.24310297.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估人工智能大型语言模型 ChatGPT 对心脏成像问题的回答。

Evaluation of responses to cardiac imaging questions by the artificial intelligence large language model ChatGPT.

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献