Department of Ophthalmology, Adana 5 Ocak State Hospital, Adana, Turkey.
Health Informatics J. 2024 Oct-Dec;30(4):14604582241304679. doi: 10.1177/14604582241304679.
This study aimed to investigate the accuracy, reliability, and readability of A-Eye Consult, ChatGPT-4.0, Google Gemini and Copilot AI large language models (LLMs) in responding to patient questions about endophthalmitis. The LLMs' responses to 25 questions about endophthalmitis, frequently asked by patients, were evaluated by two ophthalmologists using a five-point Likert scale, with scores ranging from 1-5. The DISCERN scale assessed the reliability of the LLMs' responses, whereas the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) indices assessed readability and text complexity, respectively. A-Eye Consult and ChatGPT-4.0 outperformed Google Gemini and Copilot in providing comprehensive and precise responses. The Likert score significantly differed across all four LLMs ( < .001), with A-Eye Consult scoring significantly higher than Google Gemini and Copilot ( < .001). A-Eye Consult and ChatGPT-4.0 responses, while more complex than those of other LLMs, provided more reliable and accurate information.
本研究旨在探讨 A-Eye Consult、ChatGPT-4.0、Google Gemini 和 Copilot AI 大型语言模型(LLM)在回答患者有关眼内炎问题时的准确性、可靠性和可读性。两位眼科医生使用五点李克特量表对 LLM 对 25 个关于眼内炎的问题的回答进行了评估,分数范围为 1-5。DISCERN 量表评估了 LLM 回答的可靠性,而 Flesch 阅读容易度(FRE)和 Flesch-Kincaid 年级水平(FKGL)指数分别评估了可读性和文本复杂性。A-Eye Consult 和 ChatGPT-4.0 在提供全面和准确的回答方面优于 Google Gemini 和 Copilot。所有四个 LLM 的李克特评分均存在显著差异( <.001),A-Eye Consult 的评分明显高于 Google Gemini 和 Copilot( <.001)。A-Eye Consult 和 ChatGPT-4.0 的回答虽然比其他 LLM 更复杂,但提供了更可靠和准确的信息。