• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

流行的大语言模型在青光眼患者教育中的表现:一项随机对照研究。

Performance of popular large language models in glaucoma patient education: A randomized controlled study.

作者信息

Cao Yuyu, Lu Wei, Shi Runhan, Liu Fuying, Liu Steven, Xu Xinwei, Yang Jin, Rong Guangyu, Xin Changchang, Zhou Xujiao, Sun Xinghuai, Hong Jiaxu

机构信息

Department of Ophthalmology, Eye & ENT Hospital, State Key Laboratory of Medical Neurobiology, Fudan University, Shanghai, China.

NHC Key Laboratory of Myopia and Related Eye Diseases Shanghai, China.

出版信息

Adv Ophthalmol Pract Res. 2024 Dec 3;5(2):88-94. doi: 10.1016/j.aopr.2024.12.002. eCollection 2025 May-Jun.

DOI:10.1016/j.aopr.2024.12.002
PMID:40162329
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11951182/
Abstract

PURPOSE

The advent of chatbots based on large language models (LLMs), such as ChatGPT, has significantly transformed knowledge acquisition. However, the application of LLMs in glaucoma patient education remains elusive. In this study, we comprehensively compared the performance of four common LLMs - Qwen, Baichuan 2, ChatGPT-4.0, and PaLM 2 - in the context of glaucoma patient education.

METHODS

Initially, senior ophthalmologists were asked with scoring responses generated by the LLMs, which were answers to the most frequent glaucoma-related questions posed by patients. The Chinese Readability Platform was employed to assess the recommended reading age and reading difficulty score of the four LLMs. Subsequently, optimized models were filtered, and 29 glaucoma patients participated in posing questions to the chatbots and scoring the answers within a real-world clinical setting. Attending ophthalmologists were also required to score the answers across five dimensions: correctness, completeness, readability, helpfulness, and safety. Patients, on the other hand, scored the answers based on three dimensions: satisfaction, readability, and helpfulness.

RESULTS

In the first stage, Baichuan 2 and ChatGPT-4.0 outperformed the other two models, though ChatGPT-4.0 had higher recommended reading age and reading difficulty scores. In the second stage, both Baichuan 2 and ChatGPT-4.0 demonstrated exceptional performance among patients and ophthalmologists, with no statistically significant differences observed.

CONCLUSIONS

Our research identifies Baichuan 2 and ChatGPT-4.0 as prominent LLMs, offering viable options for glaucoma education.

摘要

目的

基于大语言模型(LLMs)的聊天机器人,如ChatGPT的出现,显著改变了知识获取方式。然而,大语言模型在青光眼患者教育中的应用仍不明确。在本研究中,我们在青光眼患者教育背景下全面比较了四种常见大语言模型——文心一言、百川2、ChatGPT-4.0和PaLM 2的性能。

方法

首先,让资深眼科医生对大语言模型生成的回答进行评分,这些回答是针对患者提出的最常见青光眼相关问题的答案。使用中文可读性平台评估这四种大语言模型的推荐阅读年龄和阅读难度得分。随后,筛选出优化模型,29名青光眼患者参与在真实临床环境中向聊天机器人提问并对答案进行评分。主治眼科医生还需从正确性、完整性、可读性、实用性和安全性五个维度对答案进行评分。另一方面,患者则从满意度、可读性和实用性三个维度对答案进行评分。

结果

在第一阶段,百川2和ChatGPT-4.0的表现优于其他两个模型,尽管ChatGPT-4.0的推荐阅读年龄和阅读难度得分更高。在第二阶段,百川2和ChatGPT-4.0在患者和眼科医生中均表现出色,未观察到统计学上的显著差异。

结论

我们的研究确定百川2和ChatGPT-4.0为优秀的大语言模型,为青光眼教育提供了可行的选择。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0720/11951182/ab29c3b451da/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0720/11951182/7bb1e58903d1/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0720/11951182/c4fad734e387/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0720/11951182/e4698d1db0f1/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0720/11951182/ab29c3b451da/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0720/11951182/7bb1e58903d1/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0720/11951182/c4fad734e387/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0720/11951182/e4698d1db0f1/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0720/11951182/ab29c3b451da/gr4.jpg

相似文献

1
Performance of popular large language models in glaucoma patient education: A randomized controlled study.流行的大语言模型在青光眼患者教育中的表现:一项随机对照研究。
Adv Ophthalmol Pract Res. 2024 Dec 3;5(2):88-94. doi: 10.1016/j.aopr.2024.12.002. eCollection 2025 May-Jun.
2
Benchmarking four large language models' performance of addressing Chinese patients' inquiries about dry eye disease: A two-phase study.评估四种大型语言模型解答中国患者关于干眼症问题的性能:一项两阶段研究。
Heliyon. 2024 Jul 14;10(14):e34391. doi: 10.1016/j.heliyon.2024.e34391. eCollection 2024 Jul 30.
3
Evaluating the effectiveness of large language models in patient education for conjunctivitis.评估大语言模型在结膜炎患者教育中的有效性。
Br J Ophthalmol. 2025 Jan 28;109(2):185-191. doi: 10.1136/bjo-2024-325599.
4
Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study.评估大语言模型为中国重症肌无力性眼病患者提供患者教育的有效性:混合方法研究
J Med Internet Res. 2025 Apr 10;27:e67883. doi: 10.2196/67883.
5
Do large language model chatbots perform better than established patient information resources in answering patient questions? A comparative study on melanoma.在回答患者问题方面,大型语言模型聊天机器人的表现是否优于成熟的患者信息资源?一项关于黑色素瘤的比较研究。
Br J Dermatol. 2025 Jan 24;192(2):306-315. doi: 10.1093/bjd/ljae377.
6
Appropriateness and readability of Google Bard and ChatGPT-3.5 generated responses for surgical treatment of glaucoma.谷歌巴德和 ChatGPT-3.5 生成的青光眼手术治疗回复的适宜性和可读性。
Rom J Ophthalmol. 2024 Jul-Sep;68(3):243-248. doi: 10.22336/rjo.2024.45.
7
Dr. Google vs. Dr. ChatGPT: Exploring the Use of Artificial Intelligence in Ophthalmology by Comparing the Accuracy, Safety, and Readability of Responses to Frequently Asked Patient Questions Regarding Cataracts and Cataract Surgery.谷歌医生与ChatGPT医生:通过比较关于白内障及白内障手术的常见患者问题的回答的准确性、安全性和可读性,探索人工智能在眼科领域的应用。
Semin Ophthalmol. 2024 Aug;39(6):472-479. doi: 10.1080/08820538.2024.2326058. Epub 2024 Mar 22.
8
Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的可读性:一项观察性横断面研究。
Cureus. 2024 Jul 4;16(7):e63865. doi: 10.7759/cureus.63865. eCollection 2024 Jul.
9
Evaluation of the reliability and readability of answers given by chatbots to frequently asked questions about endophthalmitis: A cross-sectional study on chatbots.评估聊天机器人对眼内炎常见问题回答的可靠性和可读性:一项关于聊天机器人的横断面研究。
Health Informatics J. 2024 Oct-Dec;30(4):14604582241304679. doi: 10.1177/14604582241304679.
10
Assessing the performance of large language models (LLMs) in answering medical questions regarding breast cancer in the Chinese context.评估大语言模型(LLMs)在中国背景下回答有关乳腺癌医学问题的表现。
Digit Health. 2024 Oct 7;10:20552076241284771. doi: 10.1177/20552076241284771. eCollection 2024 Jan-Dec.

本文引用的文献

1
Toward expert-level medical question answering with large language models.迈向使用大语言模型实现专家级医学问答
Nat Med. 2025 Mar;31(3):943-950. doi: 10.1038/s41591-024-03423-7. Epub 2025 Jan 8.
2
Xiaoqing: A Q&A model for glaucoma based on LLMs.晓青:基于大语言模型的青光眼问答模型。
Comput Biol Med. 2024 May;174:108399. doi: 10.1016/j.compbiomed.2024.108399. Epub 2024 Apr 12.
3
Assessment of a Large Language Model's Responses to Questions and Cases About Glaucoma and Retina Management.评估大型语言模型对青光眼和视网膜管理相关问题和病例的回答。
JAMA Ophthalmol. 2024 Apr 1;142(4):371-375. doi: 10.1001/jamaophthalmol.2023.6917.
4
ChatGPT Versus Consultants: Blinded Evaluation on Answering Otorhinolaryngology Case-Based Questions.ChatGPT与医学顾问的对比:对耳鼻喉科基于病例问题回答的盲法评估
JMIR Med Educ. 2023 Dec 5;9:e49183. doi: 10.2196/49183.
5
Zero-shot interpretable phenotyping of postpartum hemorrhage using large language models.使用大语言模型对产后出血进行零样本可解释表型分析。
NPJ Digit Med. 2023 Nov 30;6(1):212. doi: 10.1038/s41746-023-00957-x.
6
ChatGPT's Epoch in Rheumatological Diagnostics: A Critical Assessment in the Context of Sjögren's Syndrome.ChatGPT在风湿病诊断中的时代:干燥综合征背景下的批判性评估
Cureus. 2023 Oct 26;15(10):e47754. doi: 10.7759/cureus.47754. eCollection 2023 Oct.
7
Reporting standards for the use of large language model-linked chatbots for health advice.使用与大语言模型相关的聊天机器人提供健康建议的报告标准。
Nat Med. 2023 Dec;29(12):2988. doi: 10.1038/s41591-023-02656-2.
8
Capabilities of GPT-4 in ophthalmology: an analysis of model entropy and progress towards human-level medical question answering.GPT-4 在眼科领域的能力:对模型熵的分析及迈向人类水平医学问答的进展。
Br J Ophthalmol. 2024 Sep 20;108(10):1371-1378. doi: 10.1136/bjo-2023-324438.
9
Popular large language model chatbots' accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries.流行的大语言模型聊天机器人在回答眼部症状查询时的准确性、全面性和自我意识。
iScience. 2023 Oct 10;26(11):108163. doi: 10.1016/j.isci.2023.108163. eCollection 2023 Nov 17.
10
AI-Powered Renal Diet Support: Performance of ChatGPT, Bard AI, and Bing Chat.人工智能驱动的肾脏饮食支持:ChatGPT、Bard AI和必应聊天的性能
Clin Pract. 2023 Sep 26;13(5):1160-1172. doi: 10.3390/clinpract13050104.