Kozaily Elie, Geagea Mabelissa, Akdogan Ecem R, Atkins Jessica, Elshazly Mohamed B, Guglin Maya, Tedford Ryan J, Wehbe Ramsey M
Division of Cardiology, Department of Medicine, Medical University of South Carolina, Charleston, SC, USA.
Division of Cardiology, Department of Medicine, Hotel-Dieu de France, Beirut, Lebanon.
Int J Cardiol. 2024 Aug 1;408:132115. doi: 10.1016/j.ijcard.2024.132115. Epub 2024 Apr 30.
Heart failure (HF) is a prevalent condition associated with significant morbidity. Patients may have questions that they feel embarrassed to ask or will face delays awaiting responses from their healthcare providers which may impact their health behavior. We aimed to investigate the potential of large language model (LLM) based artificial intelligence (AI) chat platforms in complementing the delivery of patient-centered care.
Using online patient forums and physician experience, we created 30 questions related to diagnosis, management and prognosis of HF. The questions were posed to two LLM-based AI chat platforms (OpenAI's ChatGPT-3.5 and Google's Bard). Each set of answers was evaluated by two HF experts, independently and blinded to each other, for accuracy (adequacy of content) and consistency of content.
ChatGPT provided mostly appropriate answers (27/30, 90%) and showed a high degree of consistency (93%). Bard provided a similar content in its answers and thus was evaluated only for adequacy (23/30, 77%). The two HF experts' grades were concordant in 83% and 67% of the questions for ChatGPT and Bard, respectively.
LLM-based AI chat platforms demonstrate potential in improving HF education and empowering patients, however, these platforms currently suffer from issues related to factual errors and difficulty with more contemporary recommendations. This inaccurate information may pose serious and life-threatening implications for patients that should be considered and addressed in future research.
心力衰竭(HF)是一种常见疾病,伴有严重的发病率。患者可能有一些问题,他们不好意思问,或者会面临等待医疗服务提供者回复的延迟,这可能会影响他们的健康行为。我们旨在研究基于大语言模型(LLM)的人工智能(AI)聊天平台在补充以患者为中心的护理方面的潜力。
利用在线患者论坛和医生经验,我们创建了30个与HF的诊断、管理和预后相关的问题。这些问题被提交给两个基于LLM的AI聊天平台(OpenAI的ChatGPT-3.5和谷歌的Bard)。每组答案由两名HF专家独立评估,且彼此不知情,评估内容包括准确性(内容的充分性)和内容的一致性。
ChatGPT提供的大多是恰当答案(27/30,90%),且显示出高度的一致性(93%)。Bard在其答案中提供了类似的内容,因此仅对充分性进行评估(23/30,77%)。对于ChatGPT和Bard的问题,两位HF专家的评分分别在83%和67%的问题上一致。
基于LLM的AI聊天平台在改善HF教育和增强患者能力方面显示出潜力,然而,这些平台目前存在与事实错误以及难以提供更现代建议相关的问题。这种不准确的信息可能会给患者带来严重的、危及生命的影响,在未来的研究中应予以考虑和解决。