• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ChatGPT、Gemini、Copilot和Claude对双眼皮手术相关问题的回答准确性。

Accuracy of ChatGPT, Gemini, Copilot, and Claude to Blepharoplasty-Related Questions.

作者信息

Köksaldı Seher, Kayabaşı Mustafa, Durmaz Engin Ceren, Grzybowski Andrzej

机构信息

Department of Ophthalmology, Agri Ibrahim Cecen University, 04200, Agri, Turkey.

Department of Ophthalmology, Mus State Hospital, 49200, Mus, Turkey.

出版信息

Aesthetic Plast Surg. 2025 Jul 21. doi: 10.1007/s00266-025-05071-9.

DOI:10.1007/s00266-025-05071-9
PMID:40691658
Abstract

BACKGROUND

This study aimed to evaluate the performance of four large language models (LLMs)-ChatGPT, Gemini, Copilot, and Claude-in responding to upper eyelid blepharoplasty-related questions, focusing on medical accuracy, clinical relevance, response length, and readability.

METHODS

A set of queries regarding upper eyelid blepharoplasty, covering six categories (anatomy, surgical procedure, additional intraoperative procedures, postoperative monitoring, follow-up, and postoperative complications) were posed to each LLM. An identical prompt establishing clinical context was provided before each question. Responses were evaluated by three ophthalmologists using a 5-point Likert scale for medical accuracy and a 3-point Likert scale for clinical relevance. The length of the responses was assessed. Readability was also evaluated using the Flesch Reading Ease Score, Flesch-Kincaid Grade Level, Coleman-Liau Index, Gunning Fog Index, and Simple Measure of Gobbledygook grade.

RESULTS

A total of 30 standardized questions were presented to each LLM. None of the responses from any LLM received a score of 1 regarding medical accuracy for any question. ChatGPT achieved an 80% 'highly accurate' response rate, followed by Claude (60%), Gemini (40%), and Copilot (20%). None of the responses from ChatGPT and Claude received a score of 1 regarding clinical relevance, whereas 10% of Gemini's responses and 26.7% of Copilot's responses received a score of 1. ChatGPT also provided the most clinically 'relevant' responses (86.7%), outperforming the other LLMs. Copilot generated the shortest responses, while ChatGPT generated the longest. Readability analyses revealed that all responses required advanced reading skills at a 'college graduate' level or higher, with Copilot's responses being the most complex.

CONCLUSION

ChatGPT demonstrated superior performance in both medical accuracy and clinical relevance among evaluated LLMs regarding upper eyelid blepharoplasty, particularly excelling in postoperative monitoring and follow-up categories. While all models generated complex texts requiring advanced literacy, ChatGPT's detailed responses offer valuable guidance for ophthalmologists managing upper eyelid blepharoplasty cases.

LEVEL OF EVIDENCE V

This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266 .

摘要

背景

本研究旨在评估四种大语言模型(LLMs)——ChatGPT、Gemini、Copilot和Claude——对上睑成形术相关问题的回答表现,重点关注医学准确性、临床相关性、回答长度和可读性。

方法

向每个大语言模型提出一组关于上睑成形术的问题,涵盖六个类别(解剖学、手术过程、额外的术中操作、术后监测、随访和术后并发症)。在每个问题之前提供相同的建立临床背景的提示。由三位眼科医生使用5点李克特量表评估医学准确性,使用3点李克特量表评估临床相关性。评估回答的长度。还使用弗莱什易读性分数、弗莱什-金凯德年级水平、科尔曼-廖指数、冈宁雾度指数和胡言乱语简易测量等级评估可读性。

结果

每个大语言模型共提出30个标准化问题。对于任何问题,没有一个大语言模型的回答在医学准确性方面得分为1。ChatGPT的“高度准确”回答率达到80%,其次是Claude(60%)、Gemini(40%)和Copilot(20%)。ChatGPT和Claude的回答在临床相关性方面没有一个得分为1,而Gemini的回答中有10%、Copilot的回答中有26.7%得分为1。ChatGPT还提供了最具临床“相关性”的回答(86.7%),优于其他大语言模型。Copilot生成的回答最短,而ChatGPT生成的回答最长。可读性分析表明,所有回答都需要“大学毕业生”及以上水平的高级阅读技能,Copilot的回答最为复杂。

结论

在评估的关于上睑成形术的大语言模型中,ChatGPT在医学准确性和临床相关性方面均表现出色,尤其在术后监测和随访类别中表现优异。虽然所有模型生成的文本都很复杂,需要较高的读写能力,但ChatGPT的详细回答为处理上睑成形术病例的眼科医生提供了有价值的指导。

证据水平V:本杂志要求作者为每篇文章指定证据水平。有关这些循证医学评级的完整描述,请参阅目录或在线作者指南www.springer.com/00266 。

相似文献

1
Accuracy of ChatGPT, Gemini, Copilot, and Claude to Blepharoplasty-Related Questions.ChatGPT、Gemini、Copilot和Claude对双眼皮手术相关问题的回答准确性。
Aesthetic Plast Surg. 2025 Jul 21. doi: 10.1007/s00266-025-05071-9.
2
Enhancing the Readability of Online Patient Education Materials Using Large Language Models: Cross-Sectional Study.使用大语言模型提高在线患者教育材料的可读性:横断面研究。
J Med Internet Res. 2025 Jun 4;27:e69955. doi: 10.2196/69955.
3
Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3.葡萄膜炎中大型语言模型性能的基准测试:ChatGPT-3.5、ChatGPT-4.0、谷歌Gemini和Anthropic Claude3的比较分析
Eye (Lond). 2025 Apr;39(6):1132-1137. doi: 10.1038/s41433-024-03545-9. Epub 2024 Dec 17.
4
A structured evaluation of LLM-generated step-by-step instructions in cadaveric brachial plexus dissection.对大语言模型生成的尸体臂丛神经解剖分步指导的结构化评估。
BMC Med Educ. 2025 Jul 1;25(1):903. doi: 10.1186/s12909-025-07493-0.
5
Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.外周动脉疾病教育中的人工智能:ChatGPT与谷歌Gemini的较量
Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.
6
Performance of 3 Conversational Generative Artificial Intelligence Models for Computing Maximum Safe Doses of Local Anesthetics: Comparative Analysis.用于计算局部麻醉药最大安全剂量的3种对话式生成人工智能模型的性能:比较分析
JMIR AI. 2025 May 13;4:e66796. doi: 10.2196/66796.
7
Evaluation of ChatGPT-4o, Claude 3.5 Sonnet, and Google Gemini 2.0 Flash as Patient Education Resources for Upper Blepharoplasty Patients.评估ChatGPT-4o、Claude 3.5 Sonnet和Google Gemini 2.0 Flash作为上睑成形术患者的患者教育资源。
J Craniofac Surg. 2025 Jul 7. doi: 10.1097/SCS.0000000000011608.
8
Subthalamic nucleus or globus pallidus internus deep brain stimulation for the treatment of parkinson's disease: An artificial intelligence approach.丘脑底核或苍白球内侧部深部脑刺激治疗帕金森病:一种人工智能方法。
J Clin Neurosci. 2025 Jun 18;138:111393. doi: 10.1016/j.jocn.2025.111393.
9
Is Information About Musculoskeletal Malignancies From Large Language Models or Web Resources at a Suitable Reading Level for Patients?来自大语言模型或网络资源的关于肌肉骨骼恶性肿瘤的信息对患者来说是否处于合适的阅读水平?
Clin Orthop Relat Res. 2025 Feb 1;483(2):306-315. doi: 10.1097/CORR.0000000000003263. Epub 2024 Sep 25.
10
American Academy of Orthopaedic Surgeons OrthoInfo provides more readable information regarding rotator cuff injury than ChatGPT.美国矫形外科医师学会的OrthoInfo提供了比ChatGPT更具可读性的关于肩袖损伤的信息。
J ISAKOS. 2025 Feb 12;12:100841. doi: 10.1016/j.jisako.2025.100841.

本文引用的文献

1
Large Language Models as Decision-Making Tools in Oncology: Comparing Artificial Intelligence Suggestions and Expert Recommendations.大语言模型作为肿瘤学决策工具:比较人工智能建议与专家推荐
JCO Clin Cancer Inform. 2025 Mar;9:e2400230. doi: 10.1200/CCI-24-00230. Epub 2025 Mar 20.
2
The Role of Claude 3.5 Sonet and ChatGPT-4 in Posterior Cervical Fusion Patient Guidance.Claude 3.5 Sonet和ChatGPT-4在颈椎后路融合患者指导中的作用。
World Neurosurg. 2025 May;197:123889. doi: 10.1016/j.wneu.2025.123889. Epub 2025 Mar 11.
3
Benchmarking Vision Capabilities of Large Language Models in Surgical Examination Questions.
大型语言模型在外科检查问题中的视觉能力基准测试
J Surg Educ. 2025 Apr;82(4):103442. doi: 10.1016/j.jsurg.2025.103442. Epub 2025 Feb 9.
4
Application of multimodal large language models for safety indicator calculation and contraindication prediction in laser vision correction.多模态大语言模型在激光视力矫正安全指标计算和禁忌证预测中的应用。
NPJ Digit Med. 2025 Feb 3;8(1):82. doi: 10.1038/s41746-025-01487-4.
5
Evaluating the Performance of ChatGPT 3.5 and 4.0 on StatPearls Oculoplastic Surgery Text- and Image-Based Exam Questions.评估ChatGPT 3.5和4.0在StatPearls眼整形手术基于文本和图像的考试问题上的表现。
Cureus. 2024 Nov 16;16(11):e73812. doi: 10.7759/cureus.73812. eCollection 2024 Nov.
6
Evaluation of Responses to Questions About Keratoconus Using ChatGPT-4.0, Google Gemini and Microsoft Copilot: A Comparative Study of Large Language Models on Keratoconus.使用ChatGPT-4.0、谷歌Gemini和微软Copilot评估圆锥角膜相关问题的回答:大型语言模型在圆锥角膜方面的比较研究
Eye Contact Lens. 2025 Mar 1;51(3):e107-e111. doi: 10.1097/ICL.0000000000001158. Epub 2024 Dec 4.
7
Utilizing Large Language Models in Ophthalmology: The Current Landscape and Challenges.眼科领域中大型语言模型的应用:现状与挑战
Ophthalmol Ther. 2024 Oct;13(10):2543-2558. doi: 10.1007/s40123-024-01018-6. Epub 2024 Aug 24.
8
Large Language Models for Intraoperative Decision Support in Plastic Surgery: A Comparison between ChatGPT-4 and Gemini.大型语言模型在整形手术中的术中决策支持:ChatGPT-4 和 Gemini 的比较。
Medicina (Kaunas). 2024 Jun 8;60(6):957. doi: 10.3390/medicina60060957.
9
Exploring the Role of ChatGPT-4, BingAI, and Gemini as Virtual Consultants to Educate Families about Retinopathy of Prematurity.探索ChatGPT-4、必应人工智能和Gemini作为虚拟顾问在向家庭普及早产儿视网膜病变知识方面的作用。
Children (Basel). 2024 Jun 20;11(6):750. doi: 10.3390/children11060750.
10
Vision of the future: large language models in ophthalmology.未来展望:大语言模型在眼科学中的应用。
Curr Opin Ophthalmol. 2024 Sep 1;35(5):391-402. doi: 10.1097/ICU.0000000000001062. Epub 2024 May 30.