评估ChatGPT-4在以患者为中心的房颤信息告知和认知方面的正确性。

Evaluating ChatGPT-4's correctness in patient-focused informing and awareness for atrial fibrillation.

作者信息

Zeljkovic Ivan, Novak Matea, Jordan Ana, Lisicic Ante, Nemeth-Blažić Tatjana, Pavlovic Nikola, Manola Šime

机构信息

Department of Cardiovascular Diseases, Dubrava University Hospital, Avenija Gojka Šuška, Zagreb, Croatia.

Catholic University of Croatia, Zagreb, Croatia.

出版信息

Heart Rhythm O2. 2024 Oct 19;6(1):58-63. doi: 10.1016/j.hroo.2024.10.005. eCollection 2025 Jan.

DOI:10.1016/j.hroo.2024.10.005

PMID:40224268

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11993680/

Abstract

BACKGROUND

As artificial intelligence and large language models continue to evolve, their application in health care is expanding. OpenAI's Chat Generative Pre-trained Transformer 4 (ChatGPT-4) represents the latest advancement in this technology, capable of engaging in complex dialogues and providing information.

OBJECTIVE

This study explores the correctness of ChatGPT-4 in informing patients about atrial fibrillation.

METHODS

This cross-sectional observational study involved ChatGPT-4 in responding to a structured set of 108 questions across 10 categories related to atrial fibrillation. These categories included basic information, treatment options, lifestyle adjustments, and more, reflecting common patient inquiries. The model's responses were evaluated by a panel of 3 cardiologists on the basis of accuracy, comprehensiveness, clarity, relevance to clinical practice, and patient safety. The total correctness of ChatGPT-4 was quantitatively assessed through scores assigned in each category, and statistical analysis was performed to identify significant differences in performance across categories.

RESULTS

ChatGPT-4 provided correct and relevant answers with considerable variability across categories. It excelled in "Lifestyle Adjustments" and "Daily Life and Management" with perfect and near-perfect scores but struggled with "Miscellaneous Concerns" scoring lower. Statistical analysis confirmed significant differences in total scores across categories ( = .020).

CONCLUSION

Our results suggest that while ChatGPT-4 is reliable in categories with structured and direct queries, it shows limitations when handling complex medical queries that require in-depth explanations or clinical judgment. ChatGPT-4 demonstrates promising potential as a tool for patient-focused informing in atrial fibrillation, particularly in straightforward informing content.

摘要

背景

随着人工智能和大语言模型不断发展，它们在医疗保健领域的应用正在扩大。OpenAI的聊天生成预训练变换器4（ChatGPT-4）代表了这项技术的最新进展，能够进行复杂对话并提供信息。

目的

本研究探讨ChatGPT-4在向患者介绍心房颤动方面的正确性。

方法

这项横断面观察性研究让ChatGPT-4回答与心房颤动相关的10个类别的108个结构化问题。这些类别包括基本信息、治疗选择、生活方式调整等，反映了患者常见的疑问。3位心脏病专家组成的小组根据准确性、全面性、清晰度、与临床实践的相关性以及患者安全性对该模型的回答进行评估。通过在每个类别中分配的分数对ChatGPT-4的总体正确性进行定量评估，并进行统计分析以确定不同类别之间性能的显著差异。

结果

ChatGPT-4提供了正确且相关的答案，但不同类别之间存在相当大的差异。它在“生活方式调整”和“日常生活与管理”方面表现出色，得分完美或接近完美，但在“其他问题”方面表现不佳，得分较低。统计分析证实不同类别之间的总分存在显著差异（P = .020）。

结论

我们的结果表明，虽然ChatGPT-4在处理结构化和直接查询的类别中是可靠的，但在处理需要深入解释或临床判断的复杂医学查询时存在局限性。ChatGPT-4作为一种以患者为中心的心房颤动信息告知工具，特别是在简单的告知内容方面，显示出有前景的潜力。

相似文献

Evaluating ChatGPT-4's correctness in patient-focused informing and awareness for atrial fibrillation.

Heart Rhythm O2. 2024 Oct 19;6(1):58-63. doi: 10.1016/j.hroo.2024.10.005. eCollection 2025 Jan.

Evidence-based potential of generative artificial intelligence large language models in orthodontics: a comparative study of ChatGPT, Google Bard, and Microsoft Bing.

Eur J Orthod. 2024 Apr 13. doi: 10.1093/ejo/cjae017.

Evaluating the novel role of ChatGPT-4 in addressing corneal ulcer queries: An AI-powered insight.

Eur J Ophthalmol. 2025 Apr 28:11206721251337290. doi: 10.1177/11206721251337290.

Large Language Models for Therapy Recommendations Across 3 Clinical Specialties: Comparative Study.

J Med Internet Res. 2023 Oct 30;25:e49324. doi: 10.2196/49324.

Assessing GPT-4's Performance in Delivering Medical Advice: Comparative Analysis With Human Experts.

JMIR Med Educ. 2024 Jul 8;10:e51282. doi: 10.2196/51282.

Assessment of ChatGPT-4 in Family Medicine Board Examinations Using Advanced AI Learning and Analytical Methods: Observational Study.

JMIR Med Educ. 2024 Oct 8;10:e56128. doi: 10.2196/56128.

Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.

J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.

The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.

Orthopedics. 2024 Mar-Apr;47(2):e85-e89. doi: 10.3928/01477447-20230922-05. Epub 2023 Sep 27.

Evaluating the Performance of ChatGPT in the Prescribing Safety Assessment: Implications for Artificial Intelligence-Assisted Prescribing.

Cureus. 2024 Nov 4;16(11):e73003. doi: 10.7759/cureus.73003. eCollection 2024 Nov.

ChatGPT for Patients: A Comprehensive Study on Atrial Fibrillation Awareness.

J Innov Card Rhythm Manag. 2024 Jul 15;15(7):5946-5949. doi: 10.19102/icrm.2024.15072. eCollection 2024 Jul.

本文引用的文献

Performance of large language models as a resource for patients and healthcare professionals on atrial fibrillation.

Heart Rhythm. 2024 Oct;21(10):2048-2050. doi: 10.1016/j.hrthm.2024.05.008. Epub 2024 May 9.

Can large language models reason about medical questions?

Patterns (N Y). 2024 Mar 1;5(3):100943. doi: 10.1016/j.patter.2024.100943. eCollection 2024 Mar 8.

Accuracy of Online Artificial Intelligence Models in Primary Care Settings.

Am J Prev Med. 2024 Jun;66(6):1054-1059. doi: 10.1016/j.amepre.2024.02.006. Epub 2024 Feb 12.

Accuracy and comprehensibility of chat-based artificial intelligence for patient information on atrial fibrillation and cardiac implantable electronic devices.

Europace. 2023 Dec 28;26(1). doi: 10.1093/europace/euad369.

Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination.

Sci Rep. 2023 Nov 22;13(1):20512. doi: 10.1038/s41598-023-46995-z.

Accuracy of ChatGPT in Common Gastrointestinal Diseases: Impact for Patients and Providers.

Clin Gastroenterol Hepatol. 2024 Jun;22(6):1323-1325.e3. doi: 10.1016/j.cgh.2023.11.008. Epub 2023 Nov 19.

From statistical inference to machine learning: A paradigm shift in contemporary cardiovascular pharmacotherapy.

Br J Clin Pharmacol. 2024 Mar;90(3):691-699. doi: 10.1111/bcp.15927. Epub 2023 Nov 7.

Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments.

Sci Rep. 2023 Oct 1;13(1):16492. doi: 10.1038/s41598-023-43436-9.

Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard.

EBioMedicine. 2023 Sep;95:104770. doi: 10.1016/j.ebiom.2023.104770. Epub 2023 Aug 23.

Large language models in medicine.

Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估ChatGPT-4在以患者为中心的房颤信息告知和认知方面的正确性。

Evaluating ChatGPT-4's correctness in patient-focused informing and awareness for atrial fibrillation.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSION

背景

目的

方法

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献