Suppr超能文献

大型语言模型在提供圆锥角膜和隐形眼镜患者信息方面的比较分析。

Comparative analysis of large language models in providing patient information about keratoconus and contact lenses.

作者信息

Aribas Yavuz Kemal, Tefon Aribas Atike Burcin

机构信息

Department of Ophthalmology, Hacettepe University Medical School, 06230, Ankara, Turkey.

Department of Ophthalmology, Ankara Bilkent City Hospital, Ankara, Turkey.

出版信息

Int Ophthalmol. 2025 Aug 18;45(1):340. doi: 10.1007/s10792-025-03711-2.

Abstract

OBJECTIVE

To evaluate the accuracy, completeness, informational quality, and readability of responses generated by large language models (LLMs)-ChatGPT (OpenAI, USA), Gemini (Google, USA), and Copilot (Microsoft, USA)-to patient questions concerning keratoconus and contact lens use.

METHODS

In this cross-sectional study, 32 questions across eight domains were posed to the free versions of each model. Two independent ophthalmologists rated accuracy (6-point Likert scale) and completeness (3-point Likert scale). Information quality was assessed using the DISCERN instrument, and readability was evaluated with the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). Inter-rater agreement was measured with Cohen's Kappa.

RESULTS

Inter-rater reliability showed at least fair agreement for all LLMs. (min κ = 0.365) ChatGPT achieved significantly higher accuracy than Gemini (p < 0.001) and Copilot (p = 0.010), and higher completeness than Gemini (p = 0.001) but was similar to Copilot (p = 0.101). DISCERN scores were highest for ChatGPT (64), followed by Copilot (61) and Gemini (55). All models produced difficult-to-read content (FRES: Gemini 49.7, Copilot 45.4, ChatGPT 40.7), with FKGL values at late high school level.

CONCLUSION

All evaluated large language models were capable of providing generally accurate and thorough information regarding keratoconus and contact lens use. Nevertheless, limitations in readability across models highlight the importance of clinician oversight to ensure that patient education remains clear, accessible, and appropriately tailored to individual needs.

摘要

目的

评估大型语言模型(LLMs)——ChatGPT(美国OpenAI)、Gemini(美国谷歌)和Copilot(美国微软)——对有关圆锥角膜和隐形眼镜使用的患者问题所生成回答的准确性、完整性、信息质量和可读性。

方法

在这项横断面研究中,向每个模型的免费版本提出了八个领域的32个问题。两名独立的眼科医生对准确性(6点李克特量表)和完整性(3点李克特量表)进行评分。使用DISCERN工具评估信息质量,并用弗莱什阅读简易度评分(FRES)和弗莱什-金凯德年级水平(FKGL)评估可读性。用科恩kappa系数测量评分者间的一致性。

结果

评分者间信度显示,所有大型语言模型至少有中等程度的一致性。(最小κ=0.365)ChatGPT的准确性显著高于Gemini(p<0.001)和Copilot(p=0.010),完整性高于Gemini(p=0.001),但与Copilot相似(p=0.101)。ChatGPT的DISCERN分数最高(64),其次是Copilot(61)和Gemini(55)。所有模型生成的内容都难以阅读(FRES:Gemini为49.7,Copilot为45.4,ChatGPT为40.7),FKGL值处于高中后期水平。

结论

所有评估的大型语言模型都能够提供有关圆锥角膜和隐形眼镜使用的总体准确和全面的信息。然而,各模型在可读性方面的局限性凸显了临床医生监督的重要性,以确保患者教育仍然清晰、易懂,并能根据个体需求进行适当调整。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验