ChatGPT 在回答心力衰竭相关问题方面的适宜性。

Appropriateness of ChatGPT in Answering Heart Failure Related Questions.

机构信息

Division of Cardiology, Department of Medicine, Irvine Medical Center, University of California, Orange, CA, USA.

Karsh Division of Gastroenterology and Hepatology, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA.

出版信息

Heart Lung Circ. 2024 Sep;33(9):1314-1318. doi: 10.1016/j.hlc.2024.03.005. Epub 2024 May 31.

DOI:10.1016/j.hlc.2024.03.005

PMID:38821760

Abstract

BACKGROUND

Heart failure requires complex management, and increased patient knowledge has been shown to improve outcomes. This study assessed the knowledge of Chat Generative Pre-trained Transformer (ChatGPT) and its appropriateness as a supplemental resource of information for patients with heart failure.

METHOD

A total of 107 frequently asked heart failure-related questions were included in 3 categories: "basic knowledge" (49), "management" (41) and "other" (17). Two responses per question were generated using both GPT-3.5 and GPT-4 (i.e., two responses per question per model). The accuracy and reproducibility of responses were graded by two reviewers, board-certified in cardiology, with differences resolved by a third reviewer, board-certified in cardiology and advanced heart failure. Accuracy was graded using a four-point scale: (1) comprehensive, (2) correct but inadequate, (3) some correct and some incorrect, and (4) completely incorrect.

RESULTS

GPT-4 provided 107/107 (100%) responses with correct information. Further, GPT-4 displayed a greater proportion of comprehensive knowledge for the categories of "basic knowledge" and "management" (89.8% and 82.9%, respectively). For GPT-3, there were two total responses (1.9%) graded as "some correct and incorrect" for GPT-3.5, while no "completely incorrect" responses were produced. With respect to comprehensive knowledge, GPT-3.5 performed best in the "management" category and "other" category (prognosis, procedures, and support) (78.1%, 94.1%). The models also provided highly reproducible responses, with GPT-3.5 scoring above 94% in every category and GPT-4 with 100% for all answers.

CONCLUSIONS

GPT-3.5 and GPT-4 answered the majority of heart failure-related questions accurately and reliably. If validated in future studies, ChatGPT may serve as a useful tool in the future by providing accessible health-related information and education to patients living with heart failure. In its current state, ChatGPT necessitates further rigorous testing and validation to ensure patient safety and equity across all patient demographics.

摘要

背景

心力衰竭需要复杂的管理，患者知识的增加已被证明可以改善预后。本研究评估了 ChatGPT 的知识水平及其作为心力衰竭患者补充信息资源的适宜性。

方法

共纳入 107 个与心力衰竭相关的常见问题，分为 3 类：“基础知识”（49 个）、“管理”（41 个）和“其他”（17 个）。每个问题生成两个回答，分别使用 GPT-3.5 和 GPT-4（即每个模型每个问题两个回答）。两名具有心脏病学委员会认证的审稿人对回答的准确性和可重复性进行了评分，分歧由第三名具有心脏病学和高级心力衰竭委员会认证的审稿人解决。准确性评分采用四分制：（1）全面，（2）正确但不充分，（3）部分正确和部分错误，（4）完全错误。

结果

GPT-4 提供了 107/107（100%）个正确信息的回答。此外，GPT-4 在“基础知识”和“管理”类别中显示出更高比例的全面知识（分别为 89.8%和 82.9%）。对于 GPT-3，GPT-3.5 有两个总回答（1.9%）被评为“部分正确和错误”，而没有产生“完全错误”的回答。就全面知识而言，GPT-3.5 在“管理”类别和“其他”类别（预后、程序和支持）表现最佳（78.1%、94.1%）。模型还提供了高度可重复的回答，GPT-3.5 在每个类别中的得分均高于 94%，GPT-4 则对所有回答的得分均为 100%。