• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估大型语言模型在与上睑下垂相关问题中的表现:一项跨语言研究。

Evaluating Large Language Models in Ptosis-Related inquiries: A Cross-Lingual Study.

作者信息

Niu Ling-Han, Wei Li, Qin Bixuan, Chen Tao, Dong Li, He Yueqing, Jiang Xue, Wang Mingyang, Ma Lan, Geng Jialu, Wang Lechen, Li Dongmei

机构信息

Beijing Tongren Eye Center, and Beijing Ophthalmology Visual Science Key Lab, Beijing Tongren Hospital, Capital Medical University, Beijing, People's Republic of China.

Mingsii Co., Ltd, Beijing, People's Republic of China.

出版信息

Transl Vis Sci Technol. 2025 Jul 1;14(7):9. doi: 10.1167/tvst.14.7.9.

DOI:10.1167/tvst.14.7.9
PMID:40668049
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12279073/
Abstract

PURPOSE

The purpose of this study was to evaluate the performance of large language models (LLMs)-GPT-4, GPT-4o, Qwen2, and Qwen2.5-in addressing patient- and clinician-focused questions on ptosis-related inquiries, emphasizing cross-lingual applicability and patient-centric assessment.

METHODS

We collected 11 patient-centric and 50 doctor-centric questions covering ptosis symptoms, treatment, and postoperative care. Responses generated by GPT-4, GPT-4o, Qwen2, and Qwen2.5 were evaluated using predefined criteria: accuracy, sufficiency, clarity, and depth (doctor questions); and helpfulness, clarity, and empathy (patient questions). Clinical assessments involved 30 patients with ptosis and 8 oculoplastic surgeons rating responses on a 5-point Likert scale.

RESULTS

For doctor questions, GPT-4o outperformed Qwen2.5 in overall performance (53.1% vs. 18.8%, P = 0.035) and completeness (P = 0.049). For patient questions, GPT-4o scored higher in helpfulness (mean rank = 175.28 vs. 155.72, P = 0.035), with no significant differences in clarity or empathy. Qwen2.5 exhibited superior Chinese-language clarity compared to English (P = 0.023).

CONCLUSIONS

LLMs, particularly GPT-4o, demonstrate robust performance in ptosis-related inquiries, excelling in English and offering clinically valuable insights. Qwen2.5 showed advantages in Chinese clarity. Although promising for patient education and clinician support, these models require rigorous validation, domain-specific training, and cultural adaptation before clinical deployment. Future efforts should focus on refining multilingual capabilities and integrating real-time expert oversight to ensure safety and relevance in diverse healthcare contexts.

TRANSLATIONAL RELEVANCE

This study bridges artificial intelligence (AI) advancements with clinical practice by demonstrating how optimized LLMs can enhance patient education and cross-linguistic clinician support tools in ptosis-related inquiries.

摘要

目的

本研究旨在评估大语言模型(LLMs)——GPT-4、GPT-4o、文心一言2.0和文心一言2.5——在解决以患者和临床医生为中心的上睑下垂相关问题方面的表现,强调跨语言适用性和以患者为中心的评估。

方法

我们收集了11个以患者为中心和50个以医生为中心的问题,涵盖上睑下垂症状、治疗和术后护理。使用预定义标准评估GPT-4、GPT-4o、文心一言2.0和文心一言2.5生成的回答:准确性、充分性、清晰度和深度(医生问题);以及帮助性、清晰度和同理心(患者问题)。临床评估涉及30名上睑下垂患者和8名眼科整形医生,他们以5分李克特量表对回答进行评分。

结果

对于医生问题,GPT-4o在总体表现(53.1%对18.8%,P = 0.035)和完整性(P = 0.049)方面优于文心一言2.5。对于患者问题,GPT-4o在帮助性方面得分更高(平均排名 = 175.28对155.72,P = 0.035),在清晰度或同理心方面无显著差异。与英语相比,文心一言2.5在中文清晰度方面表现更优(P = 0.023)。

结论

大语言模型,尤其是GPT-4o,在上睑下垂相关问题的询问中表现出强大的性能,在英语方面表现出色并提供了具有临床价值的见解。文心一言2.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a71/12279073/36248ca99322/tvst-14-7-9-f002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a71/12279073/22745c404397/tvst-14-7-9-f001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a71/12279073/36248ca99322/tvst-14-7-9-f002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a71/12279073/22745c404397/tvst-14-7-9-f001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a71/12279073/36248ca99322/tvst-14-7-9-f002.jpg

相似文献

1
Evaluating Large Language Models in Ptosis-Related inquiries: A Cross-Lingual Study.评估大型语言模型在与上睑下垂相关问题中的表现:一项跨语言研究。
Transl Vis Sci Technol. 2025 Jul 1;14(7):9. doi: 10.1167/tvst.14.7.9.
2
Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.ChatGPT-4o与四个开源大语言模型基于中国罕见病目录生成诊断的性能:比较研究
J Med Internet Res. 2025 Jun 18;27:e69929. doi: 10.2196/69929.
3
Assessing the Accuracy and Reliability of Large Language Models in Psychiatry Using Standardized Multiple-Choice Questions: Cross-Sectional Study.使用标准化多项选择题评估大型语言模型在精神病学中的准确性和可靠性:横断面研究
J Med Internet Res. 2025 May 20;27:e69910. doi: 10.2196/69910.
4
Thyroid Eye Disease and Artificial Intelligence: A Comparative Study of ChatGPT-3.5, ChatGPT-4o, and Gemini in Patient Information Delivery.甲状腺眼病与人工智能:ChatGPT-3.5、ChatGPT-4o和Gemini在患者信息传递方面的比较研究
Ophthalmic Plast Reconstr Surg. 2024 Dec 24. doi: 10.1097/IOP.0000000000002882.
5
Potential of ChatGPT in youth mental health emergency triage: Comparative analysis with clinicians.ChatGPT在青少年心理健康紧急分诊中的潜力:与临床医生的比较分析
PCN Rep. 2025 Jul 15;4(3):e70159. doi: 10.1002/pcn5.70159. eCollection 2025 Sep.
6
Evaluating a Large Language Model in Translating Patient Instructions to Spanish Using a Standardized Framework.使用标准化框架评估大型语言模型在将患者指导说明翻译成西班牙语方面的表现。
JAMA Pediatr. 2025 Jul 7. doi: 10.1001/jamapediatrics.2025.1729.
7
Optimizing patient education for radioactive iodine therapy and the role of ChatGPT incorporating chain-of-thought technique: ChatGPT questionnaire.优化放射性碘治疗的患者教育以及结合思维链技术的ChatGPT的作用:ChatGPT问卷
Digit Health. 2025 Jul 7;11:20552076251357468. doi: 10.1177/20552076251357468. eCollection 2025 Jan-Dec.
8
Development and Validation of a Large Language Model-Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient Education.用于神经外科手术的基于大语言模型的聊天机器人的开发与验证:关于加强围手术期患者教育的混合方法研究
J Med Internet Res. 2025 Jul 15;27:e74299. doi: 10.2196/74299.
9
Large Language Models and Empathy: Systematic Review.大语言模型与同理心:系统综述
J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.
10
Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: Comparative Study of Claude, GPT, and Gemini.评估大语言模型在肩胛下肌上囊重建术前患者教育中的应用:Claude、GPT和Gemini的比较研究
JMIR Perioper Med. 2025 Jun 12;8:e70047. doi: 10.2196/70047.

本文引用的文献

1
Comparative performance analysis of global and chinese-domain large language models for myopia.全球和中国领域用于近视研究的大语言模型的性能对比分析
Eye (Lond). 2025 Apr 13. doi: 10.1038/s41433-025-03775-5.
2
Large language models for diabetes training: a prospective study.用于糖尿病培训的大语言模型:一项前瞻性研究。
Sci Bull (Beijing). 2025 Mar 30;70(6):934-942. doi: 10.1016/j.scib.2025.01.034. Epub 2025 Jan 27.
3
From GPT to DeepSeek: Significant gaps remain in realizing AI in healthcare.从GPT到DeepSeek:在医疗保健领域实现人工智能仍存在重大差距。
J Biomed Inform. 2025 Mar;163:104791. doi: 10.1016/j.jbi.2025.104791. Epub 2025 Feb 10.
4
How China created AI model DeepSeek and shocked the world.中国如何创建人工智能模型“深寻”并震惊世界。
Nature. 2025 Feb;638(8050):300-301. doi: 10.1038/d41586-025-00259-0.
5
Comparing the Accuracy and Readability of Glaucoma-related Question Responses and Educational Materials by Google and ChatGPT.比较谷歌和ChatGPT生成的青光眼相关问题答案及教育材料的准确性和可读性。
J Curr Glaucoma Pract. 2024 Jul-Sep;18(3):110-116. doi: 10.5005/jp-journals-10078-1448. Epub 2024 Oct 29.
6
Evaluation of the Appropriateness and Readability of ChatGPT-4 Responses to Patient Queries on Uveitis.评估ChatGPT-4对葡萄膜炎患者问题的回答的恰当性和可读性。
Ophthalmol Sci. 2024 Aug 8;5(1):100594. doi: 10.1016/j.xops.2024.100594. eCollection 2025 Jan-Feb.
7
ChatGPT for Addressing Patient-centered Frequently Asked Questions in Glaucoma Clinical Practice.ChatGPT用于解决青光眼临床实践中以患者为中心的常见问题
Ophthalmol Glaucoma. 2025 Mar-Apr;8(2):157-166. doi: 10.1016/j.ogla.2024.10.005. Epub 2024 Oct 16.
8
Artificial intelligence chatbots as sources of patient education material for cataract surgery: ChatGPT-4 versus Google Bard.人工智能聊天机器人作为白内障手术患者教育材料的来源:ChatGPT-4 与 Google Bard 对比。
BMJ Open Ophthalmol. 2024 Oct 17;9(1):e001824. doi: 10.1136/bmjophth-2024-001824.
9
Performance of Large Language Models on Medical Oncology Examination Questions.大语言模型在医学肿瘤学考试问题上的表现。
JAMA Netw Open. 2024 Jun 3;7(6):e2417641. doi: 10.1001/jamanetworkopen.2024.17641.
10
Chat-ePRO: Development and pilot study of an electronic patient-reported outcomes system based on ChatGPT.Chat-ePRO:基于 ChatGPT 的电子患者报告结局系统的开发和初步研究。
J Biomed Inform. 2024 Jun;154:104651. doi: 10.1016/j.jbi.2024.104651. Epub 2024 May 3.