• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估大语言模型在结膜炎患者教育中的有效性。

Evaluating the effectiveness of large language models in patient education for conjunctivitis.

作者信息

Wang Jingyuan, Shi Runhan, Le Qihua, Shan Kun, Chen Zhi, Zhou Xujiao, He Yao, Hong Jiaxu

机构信息

Department of Ophthalmology and Vision Science, State Key Laboratory of Molecular Engineering of Polymerse, Fudan University, Shanghai, People's Republic of China.

Macao Translatoinal Medicine Center, Macau University of Science and Technology, Taipa, Macau SAR, Macau, People's Republic of China.

出版信息

Br J Ophthalmol. 2025 Jan 28;109(2):185-191. doi: 10.1136/bjo-2024-325599.

DOI:10.1136/bjo-2024-325599
PMID:39214677
Abstract

AIMS

To evaluate the quality of responses from large language models (LLMs) to patient-generated conjunctivitis questions.

METHODS

A two-phase, cross-sectional study was conducted at the Eye and ENT Hospital of Fudan University. In phase 1, four LLMs (GPT-4, Qwen, Baichuan 2 and PaLM 2) responded to 22 frequently asked conjunctivitis questions. Six expert ophthalmologists assessed these responses using a 5-point Likert scale for correctness, completeness, readability, helpfulness and safety, supplemented by objective readability analysis. Phase 2 involved 30 conjunctivitis patients who interacted with GPT-4 or Qwen, evaluating the LLM-generated responses based on satisfaction, humanisation, professionalism and the same dimensions except for correctness from phase 1. Three ophthalmologists assessed responses using phase 1 criteria, allowing for a comparative analysis between medical and patient evaluations, probing the study's practical significance.

RESULTS

In phase 1, GPT-4 excelled across all metrics, particularly in correctness (4.39±0.76), completeness (4.31±0.96) and readability (4.65±0.59) while Qwen showed similarly strong performance in helpfulness (4.37±0.93) and safety (4.25±1.03). Baichuan 2 and PaLM 2 were effective but trailed behind GPT-4 and Qwen. The objective readability analysis revealed GPT-4's responses as the most detailed, with PaLM 2's being the most succinct. Phase 2 demonstrated GPT-4 and Qwen's robust performance, with high satisfaction levels and consistent evaluations from both patients and professionals.

CONCLUSIONS

Our study showed LLMs effectively improve patient education in conjunctivitis. These models showed considerable promise in real-world patient interactions. Despite encouraging results, further refinement, particularly in personalisation and handling complex inquiries, is essential prior to the clinical integration of these LLMs.

摘要

目的

评估大语言模型(LLMs)对患者提出的结膜炎相关问题的回答质量。

方法

在复旦大学附属眼耳鼻喉科医院进行了一项两阶段的横断面研究。在第一阶段,四个大语言模型(GPT-4、文心一言、百川2和PaLM 2)回答了22个常见的结膜炎问题。六位眼科专家使用5分李克特量表从正确性、完整性、可读性、实用性和安全性方面评估这些回答,并辅以客观的可读性分析。第二阶段涉及30名结膜炎患者与GPT-4或文心一言进行互动,基于满意度、人性化、专业性以及除第一阶段正确性之外的相同维度评估大语言模型生成的回答。三位眼科医生使用第一阶段的标准评估回答,以便进行医学评估与患者评估之间的对比分析,探究该研究的实际意义。

结果

在第一阶段,GPT-4在所有指标上表现出色,尤其是在正确性(4.39±0.76)、完整性(4.31±0.96)和可读性(4.65±0.59)方面,而文心一言在实用性(4.37±0.93)和安全性(4.25±1.03)方面表现同样出色。百川2和PaLM 2有效,但落后于GPT-4和文心一言。客观可读性分析显示GPT-4的回答最详细,PaLM 2的回答最简洁。第二阶段显示GPT-4和文心一言表现稳健,患者和专业人员的满意度都很高且评价一致固定链接。

结论

我们的研究表明大语言模型有效地改善了结膜炎患者的教育。这些模型在现实世界的患者互动中显示出了巨大的潜力。尽管结果令人鼓舞,但在这些大语言模型临床应用之前,进一步优化,特别是在个性化和处理复杂问题方面,至关重要。

相似文献

1
Evaluating the effectiveness of large language models in patient education for conjunctivitis.评估大语言模型在结膜炎患者教育中的有效性。
Br J Ophthalmol. 2025 Jan 28;109(2):185-191. doi: 10.1136/bjo-2024-325599.
2
Benchmarking four large language models' performance of addressing Chinese patients' inquiries about dry eye disease: A two-phase study.评估四种大型语言模型解答中国患者关于干眼症问题的性能:一项两阶段研究。
Heliyon. 2024 Jul 14;10(14):e34391. doi: 10.1016/j.heliyon.2024.e34391. eCollection 2024 Jul 30.
3
Evaluation of large language models for providing educational information in orthokeratology care.用于提供角膜塑形术护理教育信息的大语言模型评估。
Cont Lens Anterior Eye. 2025 Jun;48(3):102384. doi: 10.1016/j.clae.2025.102384. Epub 2025 Feb 11.
4
Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study.评估大语言模型为中国重症肌无力性眼病患者提供患者教育的有效性:混合方法研究
J Med Internet Res. 2025 Apr 10;27:e67883. doi: 10.2196/67883.
5
Performance of popular large language models in glaucoma patient education: A randomized controlled study.流行的大语言模型在青光眼患者教育中的表现:一项随机对照研究。
Adv Ophthalmol Pract Res. 2024 Dec 3;5(2):88-94. doi: 10.1016/j.aopr.2024.12.002. eCollection 2025 May-Jun.
6
Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources.大型语言模型和减重手术患者教育:GPT-3.5、GPT-4、Bard 与在线机构资源的可读性比较分析。
Surg Endosc. 2024 May;38(5):2522-2532. doi: 10.1007/s00464-024-10720-2. Epub 2024 Mar 12.
7
Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较:评估研究。
J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.
8
Do large language model chatbots perform better than established patient information resources in answering patient questions? A comparative study on melanoma.在回答患者问题方面,大型语言模型聊天机器人的表现是否优于成熟的患者信息资源?一项关于黑色素瘤的比较研究。
Br J Dermatol. 2025 Jan 24;192(2):306-315. doi: 10.1093/bjd/ljae377.
9
Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同侪患者为非专业患者解读实验室检查结果的答案质量:评估研究
ArXiv. 2024 Jan 23:arXiv:2402.01693v1.
10
Patient- and clinician-based evaluation of large language models for patient education in prostate cancer radiotherapy.基于患者和临床医生的大语言模型在前列腺癌放疗患者教育中的评估
Strahlenther Onkol. 2025 Mar;201(3):333-342. doi: 10.1007/s00066-024-02342-3. Epub 2025 Jan 10.

引用本文的文献

1
Comparison of Multiple State-of-the-Art Large Language Models for Patient Education Prior to CT and MRI Examinations.CT和MRI检查前用于患者教育的多种先进大语言模型的比较
J Pers Med. 2025 Jun 5;15(6):235. doi: 10.3390/jpm15060235.
2
Comparing Artificial Intelligence-Generated and Clinician-Created Personalized Self-Management Guidance for Patients With Knee Osteoarthritis: Blinded Observational Study.比较人工智能生成与临床医生创建的针对膝骨关节炎患者的个性化自我管理指导:盲法观察研究。
J Med Internet Res. 2025 May 7;27:e67830. doi: 10.2196/67830.