• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

干眼症医疗信息中大型语言模型的比较分析

Comparative Analysis of LLMs in Dry Eye Syndrome Healthcare Information.

作者信息

Wu Gloria, Paliath-Pathiyal Hrishi, Khan Obaid, Wang Margaret C

机构信息

Department of Ophthalmology, School of Medicine, University of California, San Francisco, CA 94143, USA.

Department of Biological Sciences, Halmos College of Arts and Sciences, Nova Southeastern University, Fort Lauderdale, FL 33328, USA.

出版信息

Diagnostics (Basel). 2025 Jul 30;15(15):1913. doi: 10.3390/diagnostics15151913.

DOI:10.3390/diagnostics15151913
PMID:40804875
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12346532/
Abstract

Dry eye syndrome affects 16 million Americans with USD 52 billion in annual healthcare costs. With large language models (LLMs) increasingly used for healthcare information, understanding their performance in delivering equitable dry eye guidance across diverse populations is critical. This study aims to evaluate and compare five major LLMs (Grok, ChatGPT, Gemini, Claude.ai, and Meta AI) regarding dry eye syndrome information delivery across different demographic groups. LLMs were queried using standardized prompts simulating a 62-year-old patient with dry eye symptoms across four demographic categories (White, Black, East Asian, and Hispanic males and females). Responses were analyzed for word count, readability, cultural sensitivity scores (0-3 scale), keyword coverage, and response times. Significant variations existed across LLMs. Word counts ranged from 32 to 346 words, with Gemini being the most comprehensive (653.8 ± 96.2 words) and Claude.ai being the most concise (207.6 ± 10.8 words). Cultural sensitivity scores revealed Grok demonstrated highest awareness for minority populations (scoring 3 for Black and Hispanic demographics), while Meta AI showed minimal cultural tailoring (0.5 ± 0.5). All models recommended specialist consultation, but medical term coverage varied significantly. Response times ranged from 7.41 s (Meta AI) to 25.32 s (Gemini). While all LLMs provided appropriate referral recommendations, substantial disparities exist in cultural sensitivity, content depth, and information delivery across demographic groups. No LLM consistently addressed the full spectrum of dry eye causes across all demographics. These findings underscore the importance for physician oversight and standardization in AI-generated healthcare information to ensure equitable access and prevent care delays.

摘要

干眼症综合征影响着1600万美国人,每年的医疗费用高达520亿美元。随着大语言模型(LLMs)越来越多地用于医疗保健信息,了解它们在为不同人群提供公平的干眼症指导方面的表现至关重要。本研究旨在评估和比较五个主要的大语言模型(Grok、ChatGPT、Gemini、Claude.ai和Meta AI)在为不同人口群体提供干眼症综合征信息方面的情况。使用标准化提示对大语言模型进行查询,模拟一名有干眼症症状的62岁患者,涉及四个人口类别(白人、黑人、东亚人和西班牙裔男性和女性)。对回复进行了字数统计、可读性、文化敏感度评分(0至3分制)、关键词覆盖范围和回复时间的分析。不同的大语言模型之间存在显著差异。字数从32字到346字不等,Gemini最为全面(653.8 ± 96.2字),Claude.ai最为简洁(207.6 ± 10.8字)。文化敏感度评分显示,Grok对少数群体的关注度最高(在黑人和西班牙裔人口统计中得分为3分),而Meta AI的文化针对性最低(0.5 ± 0.5)。所有模型都建议进行专科咨询,但医学术语的覆盖范围差异很大。回复时间从7.41秒(Meta AI)到25.32秒(Gemini)不等。虽然所有大语言模型都提供了适当的转诊建议,但在文化敏感度、内容深度和不同人口群体的信息提供方面存在很大差异。没有一个大语言模型能始终涵盖所有人口统计中干眼症的全部病因。这些发现强调了医生监督和人工智能生成的医疗保健信息标准化的重要性,以确保公平获取并防止治疗延误。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f18/12346532/b54086850826/diagnostics-15-01913-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f18/12346532/8a9af76b7f07/diagnostics-15-01913-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f18/12346532/b54086850826/diagnostics-15-01913-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f18/12346532/8a9af76b7f07/diagnostics-15-01913-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f18/12346532/b54086850826/diagnostics-15-01913-g002.jpg

相似文献

1
Comparative Analysis of LLMs in Dry Eye Syndrome Healthcare Information.干眼症医疗信息中大型语言模型的比较分析
Diagnostics (Basel). 2025 Jul 30;15(15):1913. doi: 10.3390/diagnostics15151913.
2
How Well Do Different AI Language Models Inform Patients About Radiofrequency Ablation for Varicose Veins?不同的人工智能语言模型在向患者介绍静脉曲张的射频消融治疗方面效果如何?
Cureus. 2025 Jun 22;17(6):e86537. doi: 10.7759/cureus.86537. eCollection 2025 Jun.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
Enhancing the Readability of Online Patient Education Materials Using Large Language Models: Cross-Sectional Study.使用大语言模型提高在线患者教育材料的可读性:横断面研究。
J Med Internet Res. 2025 Jun 4;27:e69955. doi: 10.2196/69955.
5
A multi-dimensional performance evaluation of large language models in dental implantology: comparison of ChatGPT, DeepSeek, Grok, Gemini and Qwen across diverse clinical scenarios.牙种植学中大型语言模型的多维性能评估:ChatGPT、百川智能、Grok、Gemini和通义千问在不同临床场景下的比较
BMC Oral Health. 2025 Jul 28;25(1):1272. doi: 10.1186/s12903-025-06619-6.
6
Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3.葡萄膜炎中大型语言模型性能的基准测试:ChatGPT-3.5、ChatGPT-4.0、谷歌Gemini和Anthropic Claude3的比较分析
Eye (Lond). 2025 Apr;39(6):1132-1137. doi: 10.1038/s41433-024-03545-9. Epub 2024 Dec 17.
7
Artificial intelligence-simplified information to advance reproductive genetic literacy and health equity.人工智能简化信息以促进生殖遗传知识普及和健康公平。
Hum Reprod. 2025 Jul 22. doi: 10.1093/humrep/deaf135.
8
Synthetic Patient-Physician Conversations Simulated by Large Language Models: A Multi-Dimensional Evaluation.由大语言模型模拟的合成医患对话:多维评估
Sensors (Basel). 2025 Jul 10;25(14):4305. doi: 10.3390/s25144305.
9
A structured evaluation of LLM-generated step-by-step instructions in cadaveric brachial plexus dissection.对大语言模型生成的尸体臂丛神经解剖分步指导的结构化评估。
BMC Med Educ. 2025 Jul 1;25(1):903. doi: 10.1186/s12909-025-07493-0.
10
Large Language Models Demonstrate Distinct Personality Profiles.大语言模型展现出独特的个性特征。
Cureus. 2025 May 23;17(5):e84706. doi: 10.7759/cureus.84706. eCollection 2025 May.

本文引用的文献

1
Dry Eye Disease Management Via Technological Methods: A Systematic Review and Network Meta-analysis.通过技术方法管理干眼疾病:一项系统评价和网状Meta分析
Ophthalmol Ther. 2025 Jul 2. doi: 10.1007/s40123-025-01187-y.
2
Availability and transparency of artificial intelligence models in radiology: a meta-research study.放射学中人工智能模型的可及性与透明度:一项元研究
Eur Radiol. 2025 Mar 17. doi: 10.1007/s00330-025-11492-6.
3
Dry eye disease treatment improves subjective quality-of-life responses in patients with AMD, independent of disease stage.
干眼症治疗可改善年龄相关性黄斑变性患者的主观生活质量反应,与疾病阶段无关。
PLoS One. 2025 Feb 6;20(2):e0318733. doi: 10.1371/journal.pone.0318733. eCollection 2025.
4
Sarcoidosis Mortality in North Carolina: Role of Region, Race, and Other Sociodemographic Variables.北卡罗来纳州的肉样瘤病死亡率:地区、种族和其他社会人口变量的作用。
N C Med J. 2024 Jun;85(4):274-282. doi: 10.18043/001c.118578.
5
Patients with floaters: Answers from virtual assistants and large language models.患有飞蚊症的患者:来自虚拟助手和大语言模型的回答。
Digit Health. 2024 Feb 14;10:20552076241229933. doi: 10.1177/20552076241229933. eCollection 2024 Jan-Dec.
6
Exploring online health information seeking and sharing among older adults: a mini-review about acceptance, potentials, and barriers.探索老年人在线健康信息的搜索与分享:关于接受度、潜力和障碍的小型综述
Front Digit Health. 2024 Jan 19;6:1336430. doi: 10.3389/fdgth.2024.1336430. eCollection 2024.
7
Prompt Engineering as an Important Emerging Skill for Medical Professionals: Tutorial.医学专业人员的新兴技能:提示工程教程
J Med Internet Res. 2023 Oct 4;25:e50638. doi: 10.2196/50638.
8
Large language models in medicine.医学中的大型语言模型。
Nat Med. 2023 Aug;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8. Epub 2023 Jul 17.
9
Dry eye disease: identification and therapeutic strategies for primary care clinicians and clinical specialists.干眼症:初级保健临床医生和临床专家的识别和治疗策略。
Ann Med. 2023 Dec;55(1):241-252. doi: 10.1080/07853890.2022.2157477.
10
Dry eye syndrome risk factors: A systemic review.干眼症综合征的风险因素:一项系统评价。
Saudi J Ophthalmol. 2022 Feb 18;35(2):131-139. doi: 10.4103/1319-4534.337849. eCollection 2021 Apr-Jun.