冠心病相关聊天机器人回复的比较评估

Comparative Evaluation of Chatbot Responses on Coronary Artery Disease.

作者信息

Pay Levent, Yumurtaş Ahmet Çağdaş, Çetin Tuğba, Çınar Tufan, Hayıroğlu Mert İlker

机构信息

Department of Cardiology, Istanbul Haseki Training and Research Hospital, Istanbul, Türkiye.

Department of Cardiology, Kars Harakani State Hospital, Kars, Türkiye.

出版信息

Turk Kardiyol Dern Ars. 2025 Jan;53(1):35-43. doi: 10.5543/tkda.2024.78131.

DOI:10.5543/tkda.2024.78131

PMID:39797456

Abstract

OBJECTIVE

Coronary artery disease (CAD) is the leading cause of morbidity and mortality globally. The growing interest in natural language processing chatbots (NLPCs) has driven their inevitable widespread adoption in healthcare. The purpose of this study was to evaluate the accuracy and reproducibility of responses provided by NLPCs, such as ChatGPT, Gemini, and Bing, to frequently asked questions about CAD.

METHODS

Fifty frequently asked questions about CAD were asked twice, with a one-week interval, on ChatGPT, Gemini, and Bing. Two cardiologists independently scored the answers into four categories: comprehensive/correct (1), incomplete/partially correct (2), a mix of accurate and inaccurate/misleading (3), and completely inaccurate/irrelevant (4). The accuracy and reproducibility of each NLPC's responses were assessed.

RESULTS

ChatGPT's responses were scored as 14% incomplete/partially correct and 86% comprehensive/correct. In contrast, Gemini provided 68% comprehensive/correct responses, 30% incomplete/partially correct responses, and 2% a mix of accurate and inaccurate/misleading information. Bing delivered 60% comprehensive/correct responses, 26% incomplete/partially correct responses, and 8% a mix of accurate and inaccurate/misleading information. Reproducibility scores were 88% for ChatGPT, 84% for Gemini, and 70% for Bing.

CONCLUSION

ChatGPT demonstrates significant potential to improve patient education about coronary artery disease by providing more sensitive and accurate answers compared to Bing and Gemini.

摘要

目的

冠状动脉疾病（CAD）是全球发病和死亡的主要原因。对自然语言处理聊天机器人（NLPCs）的兴趣与日俱增，推动了它们在医疗保健领域的广泛应用。本研究的目的是评估ChatGPT、Gemini和Bing等NLPCs对关于CAD的常见问题所提供回答的准确性和可重复性。

方法

在ChatGPT、Gemini和Bing上，每隔一周询问50个关于CAD的常见问题两次。两位心脏病专家将答案独立分为四类：全面/正确（1）、不完整/部分正确（2）、准确与不准确/误导性信息混合（3）以及完全不准确/不相关（4）。评估每个NLPC回答的准确性和可重复性。

结果

ChatGPT的回答被评为14%不完整/部分正确，86%全面/正确。相比之下，Gemini提供了68%全面/正确的回答、30%不完整/部分正确的回答以及2%准确与不准确/误导性信息混合的回答。Bing提供了60%全面/正确的回答、26%不完整/部分正确的回答以及8%准确与不准确/误导性信息混合的回答。ChatGPT的可重复性评分为88%，Gemini为84%，Bing为70%。