Department of Anesthesiology and Reanimation, School of Medicine, Dokuz Eylul University, Izmir, Turkey.
Departments of Faculty of Engineering, Ostim Technical University, Artificial Intelligence Engineering, Ankara, Turkey.
Medicine (Baltimore). 2024 May 31;103(22):e38352. doi: 10.1097/MD.0000000000038352.
This study aimed to evaluate the readability, reliability, and quality of responses by 4 selected artificial intelligence (AI)-based large language model (LLM) chatbots to questions related to cardiopulmonary resuscitation (CPR). This was a cross-sectional study. Responses to the 100 most frequently asked questions about CPR by 4 selected chatbots (ChatGPT-3.5 [Open AI], Google Bard [Google AI], Google Gemini [Google AI], and Perplexity [Perplexity AI]) were analyzed for readability, reliability, and quality. The chatbots were asked the following question: "What are the 100 most frequently asked questions about cardio pulmonary resuscitation?" in English. Each of the 100 queries derived from the responses was individually posed to the 4 chatbots. The 400 responses or patient education materials (PEM) from the chatbots were assessed for quality and reliability using the modified DISCERN Questionnaire, Journal of the American Medical Association and Global Quality Score. Readability assessment utilized 2 different calculators, which computed readability scores independently using metrics such as Flesch Reading Ease Score, Flesch-Kincaid Grade Level, Simple Measure of Gobbledygook, Gunning Fog Readability and Automated Readability Index. Analyzed 100 responses from each of the 4 chatbots. When the readability values of the median results obtained from Calculators 1 and 2 were compared with the 6th-grade reading level, there was a highly significant difference between the groups (P < .001). Compared to all formulas, the readability level of the responses was above 6th grade. It can be seen that the order of readability from easy to difficult is Bard, Perplexity, Gemini, and ChatGPT-3.5. The readability of the text content provided by all 4 chatbots was found to be above the 6th-grade level. We believe that enhancing the quality, reliability, and readability of PEMs will lead to easier understanding by readers and more accurate performance of CPR. So, patients who receive bystander CPR may experience an increased likelihood of survival.
本研究旨在评估 4 种选定的基于人工智能 (AI) 的大型语言模型 (LLM) 聊天机器人对心肺复苏术 (CPR) 相关问题的回答的可读性、可靠性和质量。这是一项横断面研究。分析了 4 种选定的聊天机器人(Open AI 的 ChatGPT-3.5、Google AI 的 Bard、Google AI 的 Gemini 和 Perplexity AI 的 Perplexity)对 100 个最常问到的关于心肺复苏术问题的回答的可读性、可靠性和质量。以英语向聊天机器人提出以下问题:“心肺复苏术最常问到的 100 个问题是什么?”从每个聊天机器人的回答中提取出 100 个查询,并将其逐个向 4 种聊天机器人提出。使用修改后的 DISCERN 问卷、《美国医学会杂志》和全球质量评分对来自聊天机器人的 400 个回复或患者教育材料 (PEM) 进行质量和可靠性评估。使用 2 种不同的计算器进行可读性评估,这些计算器使用 Flesch 阅读容易度评分、Flesch-Kincaid 年级水平、简单的胡言乱语测量、Gunning Fog 阅读和自动可读性指数等指标独立计算可读性得分。分析了来自 4 种聊天机器人的每个机器人的 100 个回复。当将计算器 1 和 2 获得的中位数结果的可读性值与 6 年级阅读水平进行比较时,各组之间存在非常显著的差异(P<0.001)。与所有公式相比,回复的可读性水平都高于 6 年级。可以看出,从易到难的可读性顺序是 Bard、Perplexity、Gemini 和 ChatGPT-3.5。所有 4 种聊天机器人提供的文本内容的可读性都高于 6 年级水平。我们相信,提高 PEM 的质量、可靠性和可读性将使读者更容易理解,并更准确地进行 CPR。因此,接受旁观者 CPR 的患者可能会增加生存的可能性。
BMC Med Inform Decis Mak. 2025-9-1
Int J Surg Protoc. 2025-6-11
BMC Oral Health. 2025-8-23
Med Oral Patol Oral Cir Bucal. 2025-7-1
Graefes Arch Clin Exp Ophthalmol. 2024-9
JMIR Hum Factors. 2024-3-8
J Reconstr Microsurg. 2024-11
Clin Exp Optom. 2025-1
Cleft Palate Craniofac J. 2025-4