文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

评估大型语言模型在与上睑下垂相关问题中的表现:一项跨语言研究。

Evaluating Large Language Models in Ptosis-Related inquiries: A Cross-Lingual Study.

作者信息

Niu Ling-Han, Wei Li, Qin Bixuan, Chen Tao, Dong Li, He Yueqing, Jiang Xue, Wang Mingyang, Ma Lan, Geng Jialu, Wang Lechen, Li Dongmei

机构信息

Beijing Tongren Eye Center, and Beijing Ophthalmology Visual Science Key Lab, Beijing Tongren Hospital, Capital Medical University, Beijing, People's Republic of China.

Mingsii Co., Ltd, Beijing, People's Republic of China.

出版信息

Transl Vis Sci Technol. 2025 Jul 1;14(7):9. doi: 10.1167/tvst.14.7.9.


DOI:10.1167/tvst.14.7.9
PMID:40668049
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12279073/
Abstract

PURPOSE: The purpose of this study was to evaluate the performance of large language models (LLMs)-GPT-4, GPT-4o, Qwen2, and Qwen2.5-in addressing patient- and clinician-focused questions on ptosis-related inquiries, emphasizing cross-lingual applicability and patient-centric assessment. METHODS: We collected 11 patient-centric and 50 doctor-centric questions covering ptosis symptoms, treatment, and postoperative care. Responses generated by GPT-4, GPT-4o, Qwen2, and Qwen2.5 were evaluated using predefined criteria: accuracy, sufficiency, clarity, and depth (doctor questions); and helpfulness, clarity, and empathy (patient questions). Clinical assessments involved 30 patients with ptosis and 8 oculoplastic surgeons rating responses on a 5-point Likert scale. RESULTS: For doctor questions, GPT-4o outperformed Qwen2.5 in overall performance (53.1% vs. 18.8%, P = 0.035) and completeness (P = 0.049). For patient questions, GPT-4o scored higher in helpfulness (mean rank = 175.28 vs. 155.72, P = 0.035), with no significant differences in clarity or empathy. Qwen2.5 exhibited superior Chinese-language clarity compared to English (P = 0.023). CONCLUSIONS: LLMs, particularly GPT-4o, demonstrate robust performance in ptosis-related inquiries, excelling in English and offering clinically valuable insights. Qwen2.5 showed advantages in Chinese clarity. Although promising for patient education and clinician support, these models require rigorous validation, domain-specific training, and cultural adaptation before clinical deployment. Future efforts should focus on refining multilingual capabilities and integrating real-time expert oversight to ensure safety and relevance in diverse healthcare contexts. TRANSLATIONAL RELEVANCE: This study bridges artificial intelligence (AI) advancements with clinical practice by demonstrating how optimized LLMs can enhance patient education and cross-linguistic clinician support tools in ptosis-related inquiries.

摘要

目的:本研究旨在评估大语言模型(LLMs)——GPT-4、GPT-4o、文心一言2.0和文心一言2.5——在解决以患者和临床医生为中心的上睑下垂相关问题方面的表现,强调跨语言适用性和以患者为中心的评估。 方法:我们收集了11个以患者为中心和50个以医生为中心的问题,涵盖上睑下垂症状、治疗和术后护理。使用预定义标准评估GPT-4、GPT-4o、文心一言2.0和文心一言2.5生成的回答:准确性、充分性、清晰度和深度(医生问题);以及帮助性、清晰度和同理心(患者问题)。临床评估涉及30名上睑下垂患者和8名眼科整形医生,他们以5分李克特量表对回答进行评分。 结果:对于医生问题,GPT-4o在总体表现(53.1%对18.8%,P = 0.035)和完整性(P = 0.049)方面优于文心一言2.5。对于患者问题,GPT-4o在帮助性方面得分更高(平均排名 = 175.28对155.72,P = 0.035),在清晰度或同理心方面无显著差异。与英语相比,文心一言2.5在中文清晰度方面表现更优(P = 0.023)。 结论:大语言模型,尤其是GPT-4o,在上睑下垂相关问题的询问中表现出强大的性能,在英语方面表现出色并提供了具有临床价值的见解。文心一言2.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a71/12279073/36248ca99322/tvst-14-7-9-f002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a71/12279073/22745c404397/tvst-14-7-9-f001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a71/12279073/36248ca99322/tvst-14-7-9-f002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a71/12279073/22745c404397/tvst-14-7-9-f001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a71/12279073/36248ca99322/tvst-14-7-9-f002.jpg

相似文献

[1]
Evaluating Large Language Models in Ptosis-Related inquiries: A Cross-Lingual Study.

Transl Vis Sci Technol. 2025-7-1

[2]
Performance of ChatGPT-4o and Four Open-Source Large Language Models in Generating Diagnoses Based on China's Rare Disease Catalog: Comparative Study.

J Med Internet Res. 2025-6-18

[3]
Assessing the Accuracy and Reliability of Large Language Models in Psychiatry Using Standardized Multiple-Choice Questions: Cross-Sectional Study.

J Med Internet Res. 2025-5-20

[4]
Thyroid Eye Disease and Artificial Intelligence: A Comparative Study of ChatGPT-3.5, ChatGPT-4o, and Gemini in Patient Information Delivery.

Ophthalmic Plast Reconstr Surg. 2024-12-24

[5]
Potential of ChatGPT in youth mental health emergency triage: Comparative analysis with clinicians.

PCN Rep. 2025-7-15

[6]
Evaluating a Large Language Model in Translating Patient Instructions to Spanish Using a Standardized Framework.

JAMA Pediatr. 2025-7-7

[7]
Optimizing patient education for radioactive iodine therapy and the role of ChatGPT incorporating chain-of-thought technique: ChatGPT questionnaire.

Digit Health. 2025-7-7

[8]
Development and Validation of a Large Language Model-Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient Education.

J Med Internet Res. 2025-7-15

[9]
Large Language Models and Empathy: Systematic Review.

J Med Internet Res. 2024-12-11

[10]
Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: Comparative Study of Claude, GPT, and Gemini.

JMIR Perioper Med. 2025-6-12

本文引用的文献

[1]
Comparative performance analysis of global and chinese-domain large language models for myopia.

Eye (Lond). 2025-4-13

[2]
Large language models for diabetes training: a prospective study.

Sci Bull (Beijing). 2025-3-30

[3]
From GPT to DeepSeek: Significant gaps remain in realizing AI in healthcare.

J Biomed Inform. 2025-3

[4]
How China created AI model DeepSeek and shocked the world.

Nature. 2025-2

[5]
Comparing the Accuracy and Readability of Glaucoma-related Question Responses and Educational Materials by Google and ChatGPT.

J Curr Glaucoma Pract. 2024

[6]
Evaluation of the Appropriateness and Readability of ChatGPT-4 Responses to Patient Queries on Uveitis.

Ophthalmol Sci. 2024-8-8

[7]
ChatGPT for Addressing Patient-centered Frequently Asked Questions in Glaucoma Clinical Practice.

Ophthalmol Glaucoma. 2025

[8]
Artificial intelligence chatbots as sources of patient education material for cataract surgery: ChatGPT-4 versus Google Bard.

BMJ Open Ophthalmol. 2024-10-17

[9]
Performance of Large Language Models on Medical Oncology Examination Questions.

JAMA Netw Open. 2024-6-3

[10]
Chat-ePRO: Development and pilot study of an electronic patient-reported outcomes system based on ChatGPT.

J Biomed Inform. 2024-6

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索