评估人工智能大型语言模型 ChatGPT 对心脏成像问题的回答。
Evaluation of responses to cardiac imaging questions by the artificial intelligence large language model ChatGPT.
机构信息
College of Medicine, California Northstate University, 9700 W Taron Dr, Elk Grove, CA 95757, USA.
Department of Radiology, University of California, Davis Medical Center, 4860 Y St, Suite 3100, Sacramento, CA 95817, USA.
出版信息
Clin Imaging. 2024 Aug;112:110193. doi: 10.1016/j.clinimag.2024.110193. Epub 2024 May 23.
PURPOSE
To assess ChatGPT's ability as a resource for educating patients on various aspects of cardiac imaging, including diagnosis, imaging modalities, indications, interpretation of radiology reports, and management.
METHODS
30 questions were posed to ChatGPT-3.5 and ChatGPT-4 three times in three separate chat sessions. Responses were scored as correct, incorrect, or clinically misleading categories by three observers-two board certified cardiologists and one board certified radiologist with cardiac imaging subspecialization. Consistency of responses across the three sessions was also evaluated. Final categorization was based on majority vote between at least two of the three observers.
RESULTS
ChatGPT-3.5 answered seventeen of twenty eight questions correctly (61 %) by majority vote. Twenty one of twenty eight questions were answered correctly (75 %) by ChatGPT-4 by majority vote. Majority vote for correctness was not achieved for two questions. Twenty six of thirty questions were answered consistently by ChatGPT-3.5 (87 %). Twenty nine of thirty questions were answered consistently by ChatGPT-4 (97 %). ChatGPT-3.5 had both consistent and correct responses to seventeen of twenty eight questions (61 %). ChatGPT-4 had both consistent and correct responses to twenty of twenty eight questions (71 %).
CONCLUSION
ChatGPT-4 had overall better performance than ChatGTP-3.5 when answering cardiac imaging questions with regard to correctness and consistency of responses. While both ChatGPT-3.5 and ChatGPT-4 answers over half of cardiac imaging questions correctly, inaccurate, clinically misleading and inconsistent responses suggest the need for further refinement before its application for educating patients about cardiac imaging.
目的
评估 ChatGPT 作为教育患者了解心脏成像各个方面的资源的能力,包括诊断、成像方式、适应证、放射学报告解读和管理。
方法
向 ChatGPT-3.5 和 ChatGPT-4 提出了 30 个问题,在三个单独的聊天会话中分别进行了三次提问。由三位观察者(两位心脏病学委员会认证医师和一位具有心脏成像专业知识的放射学委员会认证医师)对回答进行正确、错误或临床误导的分类。还评估了三次会话中回答的一致性。最终分类基于至少三位观察者中的两位的多数票。
结果
ChatGPT-3.5 通过多数票正确回答了 28 个问题中的 17 个(61%)。ChatGPT-4 通过多数票正确回答了 28 个问题中的 21 个(75%)。两个问题没有达成多数票的正确性。ChatGPT-3.5 对 30 个问题中的 26 个(87%)回答一致。ChatGPT-4 对 30 个问题中的 29 个(97%)回答一致。ChatGPT-3.5 对 28 个问题中的 17 个(61%)回答既一致又正确。ChatGPT-4 对 28 个问题中的 20 个(71%)回答既一致又正确。
结论
在回答心脏成像问题时,ChatGPT-4 的表现总体优于 ChatGPT-3.5,无论是在回答的正确性还是一致性方面。虽然 ChatGPT-3.5 和 ChatGPT-4 对一半以上的心脏成像问题的回答是正确的,但不准确、具有临床误导性和不一致的回答表明,在将其应用于教育患者了解心脏成像之前,需要进一步改进。