基于在线大语言模型的人工智能聊天平台在回答患者关于心力衰竭问题时的准确性和一致性。

Accuracy and consistency of online large language model-based artificial intelligence chat platforms in answering patients' questions about heart failure.

作者信息

Kozaily Elie, Geagea Mabelissa, Akdogan Ecem R, Atkins Jessica, Elshazly Mohamed B, Guglin Maya, Tedford Ryan J, Wehbe Ramsey M

机构信息

Division of Cardiology, Department of Medicine, Medical University of South Carolina, Charleston, SC, USA.

Division of Cardiology, Department of Medicine, Hotel-Dieu de France, Beirut, Lebanon.

出版信息

Int J Cardiol. 2024 Aug 1;408:132115. doi: 10.1016/j.ijcard.2024.132115. Epub 2024 Apr 30.

DOI:10.1016/j.ijcard.2024.132115

PMID:38697402

Abstract

BACKGROUND

Heart failure (HF) is a prevalent condition associated with significant morbidity. Patients may have questions that they feel embarrassed to ask or will face delays awaiting responses from their healthcare providers which may impact their health behavior. We aimed to investigate the potential of large language model (LLM) based artificial intelligence (AI) chat platforms in complementing the delivery of patient-centered care.

METHODS

Using online patient forums and physician experience, we created 30 questions related to diagnosis, management and prognosis of HF. The questions were posed to two LLM-based AI chat platforms (OpenAI's ChatGPT-3.5 and Google's Bard). Each set of answers was evaluated by two HF experts, independently and blinded to each other, for accuracy (adequacy of content) and consistency of content.

RESULTS

ChatGPT provided mostly appropriate answers (27/30, 90%) and showed a high degree of consistency (93%). Bard provided a similar content in its answers and thus was evaluated only for adequacy (23/30, 77%). The two HF experts' grades were concordant in 83% and 67% of the questions for ChatGPT and Bard, respectively.

CONCLUSION

LLM-based AI chat platforms demonstrate potential in improving HF education and empowering patients, however, these platforms currently suffer from issues related to factual errors and difficulty with more contemporary recommendations. This inaccurate information may pose serious and life-threatening implications for patients that should be considered and addressed in future research.

摘要

背景

心力衰竭（HF）是一种常见疾病，伴有严重的发病率。患者可能有一些问题，他们不好意思问，或者会面临等待医疗服务提供者回复的延迟，这可能会影响他们的健康行为。我们旨在研究基于大语言模型（LLM）的人工智能（AI）聊天平台在补充以患者为中心的护理方面的潜力。

方法

利用在线患者论坛和医生经验，我们创建了30个与HF的诊断、管理和预后相关的问题。这些问题被提交给两个基于LLM的AI聊天平台（OpenAI的ChatGPT-3.5和谷歌的Bard）。每组答案由两名HF专家独立评估，且彼此不知情，评估内容包括准确性（内容的充分性）和内容的一致性。

结果

ChatGPT提供的大多是恰当答案（27/30，90%），且显示出高度的一致性（93%）。Bard在其答案中提供了类似的内容，因此仅对充分性进行评估（23/30，77%）。对于ChatGPT和Bard的问题，两位HF专家的评分分别在83%和67%的问题上一致。

结论

基于LLM的AI聊天平台在改善HF教育和增强患者能力方面显示出潜力，然而，这些平台目前存在与事实错误以及难以提供更现代建议相关的问题。这种不准确的信息可能会给患者带来严重的、危及生命的影响，在未来的研究中应予以考虑和解决。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于在线大语言模型的人工智能聊天平台在回答患者关于心力衰竭问题时的准确性和一致性。

Accuracy and consistency of online large language model-based artificial intelligence chat platforms in answering patients' questions about heart failure.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

基于在线大语言模型的人工智能聊天平台在回答患者关于心力衰竭问题时的准确性和一致性。

Accuracy and consistency of online large language model-based artificial intelligence chat platforms in answering patients' questions about heart failure.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献