• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大型语言模型在提供圆锥角膜和隐形眼镜患者信息方面的比较分析。

Comparative analysis of large language models in providing patient information about keratoconus and contact lenses.

作者信息

Aribas Yavuz Kemal, Tefon Aribas Atike Burcin

机构信息

Department of Ophthalmology, Hacettepe University Medical School, 06230, Ankara, Turkey.

Department of Ophthalmology, Ankara Bilkent City Hospital, Ankara, Turkey.

出版信息

Int Ophthalmol. 2025 Aug 18;45(1):340. doi: 10.1007/s10792-025-03711-2.

DOI:10.1007/s10792-025-03711-2
PMID:40824599
Abstract

OBJECTIVE

To evaluate the accuracy, completeness, informational quality, and readability of responses generated by large language models (LLMs)-ChatGPT (OpenAI, USA), Gemini (Google, USA), and Copilot (Microsoft, USA)-to patient questions concerning keratoconus and contact lens use.

METHODS

In this cross-sectional study, 32 questions across eight domains were posed to the free versions of each model. Two independent ophthalmologists rated accuracy (6-point Likert scale) and completeness (3-point Likert scale). Information quality was assessed using the DISCERN instrument, and readability was evaluated with the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). Inter-rater agreement was measured with Cohen's Kappa.

RESULTS

Inter-rater reliability showed at least fair agreement for all LLMs. (min κ = 0.365) ChatGPT achieved significantly higher accuracy than Gemini (p < 0.001) and Copilot (p = 0.010), and higher completeness than Gemini (p = 0.001) but was similar to Copilot (p = 0.101). DISCERN scores were highest for ChatGPT (64), followed by Copilot (61) and Gemini (55). All models produced difficult-to-read content (FRES: Gemini 49.7, Copilot 45.4, ChatGPT 40.7), with FKGL values at late high school level.

CONCLUSION

All evaluated large language models were capable of providing generally accurate and thorough information regarding keratoconus and contact lens use. Nevertheless, limitations in readability across models highlight the importance of clinician oversight to ensure that patient education remains clear, accessible, and appropriately tailored to individual needs.

摘要

目的

评估大型语言模型(LLMs)——ChatGPT(美国OpenAI)、Gemini(美国谷歌)和Copilot(美国微软)——对有关圆锥角膜和隐形眼镜使用的患者问题所生成回答的准确性、完整性、信息质量和可读性。

方法

在这项横断面研究中,向每个模型的免费版本提出了八个领域的32个问题。两名独立的眼科医生对准确性(6点李克特量表)和完整性(3点李克特量表)进行评分。使用DISCERN工具评估信息质量,并用弗莱什阅读简易度评分(FRES)和弗莱什-金凯德年级水平(FKGL)评估可读性。用科恩kappa系数测量评分者间的一致性。

结果

评分者间信度显示,所有大型语言模型至少有中等程度的一致性。(最小κ=0.365)ChatGPT的准确性显著高于Gemini(p<0.001)和Copilot(p=0.010),完整性高于Gemini(p=0.001),但与Copilot相似(p=0.101)。ChatGPT的DISCERN分数最高(64),其次是Copilot(61)和Gemini(55)。所有模型生成的内容都难以阅读(FRES:Gemini为49.7,Copilot为45.4,ChatGPT为40.7),FKGL值处于高中后期水平。

结论

所有评估的大型语言模型都能够提供有关圆锥角膜和隐形眼镜使用的总体准确和全面的信息。然而,各模型在可读性方面的局限性凸显了临床医生监督的重要性,以确保患者教育仍然清晰、易懂,并能根据个体需求进行适当调整。

相似文献

1
Comparative analysis of large language models in providing patient information about keratoconus and contact lenses.大型语言模型在提供圆锥角膜和隐形眼镜患者信息方面的比较分析。
Int Ophthalmol. 2025 Aug 18;45(1):340. doi: 10.1007/s10792-025-03711-2.
2
Evaluating the reliability of the responses of large language models to keratoconus-related questions.评估大语言模型对圆锥角膜相关问题回答的可靠性。
Clin Exp Optom. 2024 Oct 24:1-8. doi: 10.1080/08164622.2024.2419524.
3
Parental education in pediatric dysphagia: A comparative analysis of three large language models.儿科吞咽困难中的家长教育:三种大型语言模型的比较分析
J Pediatr Gastroenterol Nutr. 2025 Jul;81(1):18-26. doi: 10.1002/jpn3.70069. Epub 2025 May 8.
4
How Accurate Is AI? A Critical Evaluation of Commonly Used Large Language Models in Responding to Patient Concerns About Incidental Kidney Tumors.人工智能的准确性如何?对常用大语言模型回应患者对偶然发现的肾肿瘤担忧的批判性评估。
J Clin Med. 2025 Aug 12;14(16):5697. doi: 10.3390/jcm14165697.
5
Enhancing the Readability of Online Patient Education Materials Using Large Language Models: Cross-Sectional Study.使用大语言模型提高在线患者教育材料的可读性:横断面研究。
J Med Internet Res. 2025 Jun 4;27:e69955. doi: 10.2196/69955.
6
Readability, Reliability, and Quality Analysis of Internet-Based Patient Education Materials and Large Language Models on Meniere's Disease.基于互联网的梅尼埃病患者教育材料和大语言模型的可读性、可靠性及质量分析
J Otolaryngol Head Neck Surg. 2025 Jan-Dec;54:19160216251360651. doi: 10.1177/19160216251360651. Epub 2025 Aug 8.
7
Assessing ChatGPT's Educational Potential in Lung Cancer Radiotherapy From Clinician and Patient Perspectives: Content Quality and Readability Analysis.从临床医生和患者角度评估ChatGPT在肺癌放疗中的教育潜力:内容质量与可读性分析
JMIR Cancer. 2025 Aug 13;11:e69783. doi: 10.2196/69783.
8
Accuracy of ChatGPT, Gemini, Copilot, and Claude to Blepharoplasty-Related Questions.ChatGPT、Gemini、Copilot和Claude对双眼皮手术相关问题的回答准确性。
Aesthetic Plast Surg. 2025 Jul 21. doi: 10.1007/s00266-025-05071-9.
9
A structured evaluation of LLM-generated step-by-step instructions in cadaveric brachial plexus dissection.对大语言模型生成的尸体臂丛神经解剖分步指导的结构化评估。
BMC Med Educ. 2025 Jul 1;25(1):903. doi: 10.1186/s12909-025-07493-0.
10
Currently Available Large Language Models Are Moderately Effective in Improving Readability of English and Spanish Patient Education Materials in Pediatric Orthopaedics.目前可用的大语言模型在提高儿科骨科英语和西班牙语患者教育材料的可读性方面有一定效果。
J Am Acad Orthop Surg. 2025 Jun 24. doi: 10.5435/JAAOS-D-25-00267.

本文引用的文献

1
Utility of artificial intelligence in the diagnosis and management of keratoconus: a systematic review.人工智能在圆锥角膜诊断与管理中的应用:一项系统评价
Front Ophthalmol (Lausanne). 2024 May 17;4:1380701. doi: 10.3389/fopht.2024.1380701. eCollection 2024.
2
Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery.谷歌医生与ChatGPT医生:通过比较关于白内障及白内障手术的常见患者问题的回答的准确性、安全性和可读性,探索人工智能在眼科领域的应用。
Semin Ophthalmol. 2024 Aug;39(6):472-479. doi: 10.1080/08820538.2024.2326058. Epub 2024 Mar 22.
3
The future landscape of large language models in medicine.
医学领域大语言模型的未来前景。
Commun Med (Lond). 2023 Oct 10;3(1):141. doi: 10.1038/s43856-023-00370-1.
4
ChatGPT Assisting Diagnosis of Neuro-ophthalmology Diseases Based on Case Reports.基于病例报告的ChatGPT辅助神经眼科疾病诊断
medRxiv. 2023 Sep 14:2023.09.13.23295508. doi: 10.1101/2023.09.13.23295508.
5
Assessing the accuracy and completeness of artificial intelligence language models in providing information on methotrexate use.评估人工智能语言模型在提供甲氨蝶呤使用信息方面的准确性和完整性。
Rheumatol Int. 2024 Mar;44(3):509-515. doi: 10.1007/s00296-023-05473-5. Epub 2023 Sep 25.
6
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
7
Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of Its Successes and Shortcomings.评估ChatGPT在眼科领域的表现:对其优缺点的分析。
Ophthalmol Sci. 2023 May 5;3(4):100324. doi: 10.1016/j.xops.2023.100324. eCollection 2023 Dec.
8
Performance of an Artificial Intelligence Chatbot in Ophthalmic Knowledge Assessment.人工智能聊天机器人在眼科知识评估中的表现。
JAMA Ophthalmol. 2023 Jun 1;141(6):589-597. doi: 10.1001/jamaophthalmol.2023.1144.
9
Will ChatGPT/GPT-4 be a Lighthouse to Guide Spinal Surgeons?ChatGPT/GPT-4 会成为脊柱外科医生的灯塔吗?
Ann Biomed Eng. 2023 Jul;51(7):1362-1365. doi: 10.1007/s10439-023-03206-0. Epub 2023 Apr 18.
10
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.