评估人工智能聊天机器人对心肺复苏术 100 个最常见查询的回答的易读性、可靠性和质量：一项观察性研究。

Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study.

机构信息

Department of Anesthesiology and Reanimation, School of Medicine, Dokuz Eylul University, Izmir, Turkey.

Departments of Faculty of Engineering, Ostim Technical University, Artificial Intelligence Engineering, Ankara, Turkey.

出版信息

Medicine (Baltimore). 2024 May 31;103(22):e38352. doi: 10.1097/MD.0000000000038352.

DOI:10.1097/MD.0000000000038352

PMID:39259094

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11142831/

Abstract

This study aimed to evaluate the readability, reliability, and quality of responses by 4 selected artificial intelligence (AI)-based large language model (LLM) chatbots to questions related to cardiopulmonary resuscitation (CPR). This was a cross-sectional study. Responses to the 100 most frequently asked questions about CPR by 4 selected chatbots (ChatGPT-3.5 [Open AI], Google Bard [Google AI], Google Gemini [Google AI], and Perplexity [Perplexity AI]) were analyzed for readability, reliability, and quality. The chatbots were asked the following question: "What are the 100 most frequently asked questions about cardio pulmonary resuscitation?" in English. Each of the 100 queries derived from the responses was individually posed to the 4 chatbots. The 400 responses or patient education materials (PEM) from the chatbots were assessed for quality and reliability using the modified DISCERN Questionnaire, Journal of the American Medical Association and Global Quality Score. Readability assessment utilized 2 different calculators, which computed readability scores independently using metrics such as Flesch Reading Ease Score, Flesch-Kincaid Grade Level, Simple Measure of Gobbledygook, Gunning Fog Readability and Automated Readability Index. Analyzed 100 responses from each of the 4 chatbots. When the readability values of the median results obtained from Calculators 1 and 2 were compared with the 6th-grade reading level, there was a highly significant difference between the groups (P < .001). Compared to all formulas, the readability level of the responses was above 6th grade. It can be seen that the order of readability from easy to difficult is Bard, Perplexity, Gemini, and ChatGPT-3.5. The readability of the text content provided by all 4 chatbots was found to be above the 6th-grade level. We believe that enhancing the quality, reliability, and readability of PEMs will lead to easier understanding by readers and more accurate performance of CPR. So, patients who receive bystander CPR may experience an increased likelihood of survival.

摘要

本研究旨在评估 4 种选定的基于人工智能 (AI) 的大型语言模型 (LLM) 聊天机器人对心肺复苏术 (CPR) 相关问题的回答的可读性、可靠性和质量。这是一项横断面研究。分析了 4 种选定的聊天机器人（Open AI 的 ChatGPT-3.5、Google AI 的 Bard、Google AI 的 Gemini 和 Perplexity AI 的 Perplexity）对 100 个最常问到的关于心肺复苏术问题的回答的可读性、可靠性和质量。以英语向聊天机器人提出以下问题：“心肺复苏术最常问到的 100 个问题是什么？”从每个聊天机器人的回答中提取出 100 个查询，并将其逐个向 4 种聊天机器人提出。使用修改后的 DISCERN 问卷、《美国医学会杂志》和全球质量评分对来自聊天机器人的 400 个回复或患者教育材料 (PEM) 进行质量和可靠性评估。使用 2 种不同的计算器进行可读性评估，这些计算器使用 Flesch 阅读容易度评分、Flesch-Kincaid 年级水平、简单的胡言乱语测量、Gunning Fog 阅读和自动可读性指数等指标独立计算可读性得分。分析了来自 4 种聊天机器人的每个机器人的 100 个回复。当将计算器 1 和 2 获得的中位数结果的可读性值与 6 年级阅读水平进行比较时，各组之间存在非常显著的差异（P<0.001）。与所有公式相比，回复的可读性水平都高于 6 年级。可以看出，从易到难的可读性顺序是 Bard、Perplexity、Gemini 和 ChatGPT-3.5。所有 4 种聊天机器人提供的文本内容的可读性都高于 6 年级水平。我们相信，提高 PEM 的质量、可靠性和可读性将使读者更容易理解，并更准确地进行 CPR。因此，接受旁观者 CPR 的患者可能会增加生存的可能性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d1c/11142831/b96303c90b97/medi-103-e38352-g001.jpg

相似文献

Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study.

Medicine (Baltimore). 2024 May 31;103(22):e38352. doi: 10.1097/MD.0000000000038352.

Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.

Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.

Readability, reliability and quality of responses generated by ChatGPT, gemini, and perplexity for the most frequently asked questions about pain.

Medicine (Baltimore). 2025 Mar 14;104(11):e41780. doi: 10.1097/MD.0000000000041780.

Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.

Medicine (Baltimore). 2025 Apr 11;104(15):e42135. doi: 10.1097/MD.0000000000042135.

Assessing the readability, quality and reliability of responses produced by ChatGPT, Gemini, and Perplexity regarding most frequently asked keywords about low back pain.

PeerJ. 2025 Jan 22;13:e18847. doi: 10.7717/peerj.18847. eCollection 2025.

Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.

Cureus. 2024 Jul 4;16(7):e63865. doi: 10.7759/cureus.63865. eCollection 2024 Jul.

How artificial intelligence can provide information about subdural hematoma: Assessment of readability, reliability, and quality of ChatGPT, BARD, and perplexity responses.

Medicine (Baltimore). 2024 May 3;103(18):e38009. doi: 10.1097/MD.0000000000038009.

Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.

Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.

AI Chatbots as Sources of STD Information: A Study on Reliability and Readability.

J Med Syst. 2025 Apr 3;49(1):43. doi: 10.1007/s10916-025-02178-z.

Appropriateness and readability of Google Bard and ChatGPT-3.5 generated responses for surgical treatment of glaucoma.

Rom J Ophthalmol. 2024 Jul-Sep;68(3):243-248. doi: 10.22336/rjo.2024.45.

引用本文的文献

Comparison of the readability of ChatGPT and Bard in medical communication: a meta-analysis.

BMC Med Inform Decis Mak. 2025 Sep 1;25(1):325. doi: 10.1186/s12911-025-03035-2.

Evaluating large language models in patient education on facial plastic surgery: a standardized protocol.

Int J Surg Protoc. 2025 Jun 11;29(3):108-112. doi: 10.1097/SP9.0000000000000052. eCollection 2025 Sep.

Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer.

BMC Oral Health. 2025 Aug 23;25(1):1358. doi: 10.1186/s12903-025-06726-4.

Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis.

PLoS One. 2025 Jun 18;20(6):e0326351. doi: 10.1371/journal.pone.0326351. eCollection 2025.

Comparative analysis of language models in addressing syphilis-related queries.

Med Oral Patol Oral Cir Bucal. 2025 Jul 1;30(4):e551-e560. doi: 10.4317/medoral.27092.

Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.

Medicine (Baltimore). 2025 Apr 11;104(15):e42135. doi: 10.1097/MD.0000000000042135.

Readability, reliability and quality of responses generated by ChatGPT, gemini, and perplexity for the most frequently asked questions about pain.

Medicine (Baltimore). 2025 Mar 14;104(11):e41780. doi: 10.1097/MD.0000000000041780.

Assessing the readability, quality and reliability of responses produced by ChatGPT, Gemini, and Perplexity regarding most frequently asked keywords about low back pain.

PeerJ. 2025 Jan 22;13:e18847. doi: 10.7717/peerj.18847. eCollection 2025.

本文引用的文献

Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison.

Graefes Arch Clin Exp Ophthalmol. 2024 Sep;262(9):2945-2959. doi: 10.1007/s00417-024-06470-5. Epub 2024 Apr 4.

The Temperature Feature of ChatGPT: Modifying Creativity for Clinical Research.

JMIR Hum Factors. 2024 Mar 8;11:e53559. doi: 10.2196/53559.

Exploring AI-chatbots' capability to suggest surgical planning in ophthalmology: ChatGPT versus Google Gemini analysis of retinal detachment cases.

Br J Ophthalmol. 2024 Sep 20;108(10):1457-1469. doi: 10.1136/bjo-2023-325143.

Both Patients and Plastic Surgeons Prefer Artificial Intelligence-Generated Microsurgical Information.

J Reconstr Microsurg. 2024 Nov;40(9):657-664. doi: 10.1055/a-2273-4163. Epub 2024 Feb 21.

Google DeepMind's gemini AI versus ChatGPT: a comparative analysis in ophthalmology.

Eye (Lond). 2024 Jun;38(8):1412-1417. doi: 10.1038/s41433-024-02958-w. Epub 2024 Feb 14.

Assessing the accuracy, usefulness, and readability of artificial-intelligence-generated responses to common dermatologic surgery questions for patient education: A double-blinded comparative study of ChatGPT and Google Bard.

J Am Acad Dermatol. 2024 May;90(5):1078-1080. doi: 10.1016/j.jaad.2024.01.037. Epub 2024 Feb 1.

Talking technology: exploring chatbots as a tool for cataract patient education.

Clin Exp Optom. 2025 Jan;108(1):56-64. doi: 10.1080/08164622.2023.2298812. Epub 2024 Jan 9.

The Quality of CLP-Related Information for Patients Provided by ChatGPT.

Cleft Palate Craniofac J. 2025 Apr;62(4):588-595. doi: 10.1177/10556656231222387. Epub 2023 Dec 21.

ChatGPT Performance in Diagnostic Clinical Microbiology Laboratory-Oriented Case Scenarios.

Cureus. 2023 Dec 16;15(12):e50629. doi: 10.7759/cureus.50629. eCollection 2023 Dec.

Evaluation of Oropharyngeal Cancer Information from Revolutionary Artificial Intelligence Chatbot.

Laryngoscope. 2024 May;134(5):2252-2257. doi: 10.1002/lary.31191. Epub 2023 Nov 20.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

评估人工智能聊天机器人对心肺复苏术 100 个最常见查询的回答的易读性、可靠性和质量：一项观察性研究。

Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献