评估聊天机器人对眼内炎常见问题回答的可靠性和可读性：一项关于聊天机器人的横断面研究。

Evaluation of the reliability and readability of answers given by chatbots to frequently asked questions about endophthalmitis: A cross-sectional study on chatbots.

机构信息

Department of Ophthalmology, Adana 5 Ocak State Hospital, Adana, Turkey.

出版信息

Health Informatics J. 2024 Oct-Dec;30(4):14604582241304679. doi: 10.1177/14604582241304679.

Abstract

This study aimed to investigate the accuracy, reliability, and readability of A-Eye Consult, ChatGPT-4.0, Google Gemini and Copilot AI large language models (LLMs) in responding to patient questions about endophthalmitis. The LLMs' responses to 25 questions about endophthalmitis, frequently asked by patients, were evaluated by two ophthalmologists using a five-point Likert scale, with scores ranging from 1-5. The DISCERN scale assessed the reliability of the LLMs' responses, whereas the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) indices assessed readability and text complexity, respectively. A-Eye Consult and ChatGPT-4.0 outperformed Google Gemini and Copilot in providing comprehensive and precise responses. The Likert score significantly differed across all four LLMs ( < .001), with A-Eye Consult scoring significantly higher than Google Gemini and Copilot ( < .001). A-Eye Consult and ChatGPT-4.0 responses, while more complex than those of other LLMs, provided more reliable and accurate information.

摘要

本研究旨在探讨 A-Eye Consult、ChatGPT-4.0、Google Gemini 和 Copilot AI 大型语言模型（LLM）在回答患者有关眼内炎问题时的准确性、可靠性和可读性。两位眼科医生使用五点李克特量表对 LLM 对 25 个关于眼内炎的问题的回答进行了评估，分数范围为 1-5。DISCERN 量表评估了 LLM 回答的可靠性，而 Flesch 阅读容易度（FRE）和 Flesch-Kincaid 年级水平（FKGL）指数分别评估了可读性和文本复杂性。A-Eye Consult 和 ChatGPT-4.0 在提供全面和准确的回答方面优于 Google Gemini 和 Copilot。所有四个 LLM 的李克特评分均存在显著差异（ <.001），A-Eye Consult 的评分明显高于 Google Gemini 和 Copilot（ <.001）。A-Eye Consult 和 ChatGPT-4.0 的回答虽然比其他 LLM 更复杂，但提供了更可靠和准确的信息。

相似文献

Evaluation of the reliability and readability of answers given by chatbots to frequently asked questions about endophthalmitis: A cross-sectional study on chatbots.评估聊天机器人对眼内炎常见问题回答的可靠性和可读性：一项关于聊天机器人的横断面研究。

Health Informatics J. 2024 Oct-Dec;30(4):14604582241304679. doi: 10.1177/14604582241304679.

Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的可读性：一项观察性横断面研究。

Cureus. 2024 Jul 4;16(7):e63865. doi: 10.7759/cureus.63865. eCollection 2024 Jul.

Evaluation of Responses to Questions About Keratoconus Using ChatGPT-4.0, Google Gemini and Microsoft Copilot: A Comparative Study of Large Language Models on Keratoconus.使用ChatGPT-4.0、谷歌Gemini和微软Copilot评估圆锥角膜相关问题的回答：大型语言模型在圆锥角膜方面的比较研究

Eye Contact Lens. 2025 Mar 1;51(3):e107-e111. doi: 10.1097/ICL.0000000000001158. Epub 2024 Dec 4.

Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.评估 ChatGPT®、BARD®、 Gemini®、Copilot®、Perplexity® 在姑息治疗方面的可读性、可靠性和质量。

Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.

Investigating the role of large language models on questions about refractive surgery.探究大语言模型在屈光手术相关问题上的作用。

Int J Med Inform. 2025 Mar;195:105787. doi: 10.1016/j.ijmedinf.2025.105787. Epub 2025 Jan 6.

Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.评估人工智能聊天机器人提供的关于化疗心脏毒性的患者教育材料的质量和可读性：一项观察性横断面研究。

Medicine (Baltimore). 2025 Apr 11;104(15):e42135. doi: 10.1097/MD.0000000000042135.

Assessing the Responses of Large Language Models (ChatGPT-4, Claude 3, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Retinopathy of Prematurity: A Study on Readability and Appropriateness.评估大型语言模型（ChatGPT-4、Claude 3、Gemini和Microsoft Copilot）对早产儿视网膜病变常见问题的回答：一项关于可读性和适宜性的研究

J Pediatr Ophthalmol Strabismus. 2025 Mar-Apr;62(2):84-95. doi: 10.3928/01913913-20240911-05. Epub 2024 Oct 28.

Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists' Knowledge on COVID-19's Impacts in Pregnancy: Cross-Sectional Pilot Study.大型语言模型在新冠肺炎对妊娠影响方面的熟练度、清晰度和客观性与专家知识对比：横断面试点研究

JMIR Form Res. 2025 Feb 5;9:e56126. doi: 10.2196/56126.

Evaluating the reliability of the responses of large language models to keratoconus-related questions.评估大语言模型对圆锥角膜相关问题回答的可靠性。

Clin Exp Optom. 2024 Oct 24:1-8. doi: 10.1080/08164622.2024.2419524.

Readability, reliability and quality of responses generated by ChatGPT, gemini, and perplexity for the most frequently asked questions about pain.ChatGPT、Gemini和Perplexity针对最常见疼痛问题生成的回答的可读性、可靠性和质量。

Medicine (Baltimore). 2025 Mar 14;104(11):e41780. doi: 10.1097/MD.0000000000041780.

引用本文的文献

Evaluating artificial intelligence chatbots' responses to gynecomastia inquiries: Comparative study of information quality, readability, and guideline consistency.评估人工智能聊天机器人对男性乳房发育症咨询的回复：信息质量、可读性和指南一致性的比较研究

Digit Health. 2025 Aug 26;11:20552076251367645. doi: 10.1177/20552076251367645. eCollection 2025 Jan-Dec.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评估聊天机器人对眼内炎常见问题回答的可靠性和可读性：一项关于聊天机器人的横断面研究。

Evaluation of the reliability and readability of answers given by chatbots to frequently asked questions about endophthalmitis: A cross-sectional study on chatbots.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献