• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Chatgpt 在眼科考试中的表现;人类与 AI 相比。

Performance of Chatgpt in ophthalmology exam; human versus AI.

机构信息

Department of Ophthalmology, Beyoglu Eye Training and Research Hospital, University of Health Sciences, 34420, Istanbul, Turkey.

Department of Ophthalmology, Sancaktepe Prof. Dr. Ilhan Varank Training and Research Hospital, University of Health Sciences, Istanbul, Turkey.

出版信息

Int Ophthalmol. 2024 Nov 6;44(1):413. doi: 10.1007/s10792-024-03353-w.

DOI:10.1007/s10792-024-03353-w
PMID:39503920
Abstract

PURPOSE

This cross-sectional study focuses on evaluating the success rate of ChatGPT in answering questions from the 'Resident Training Development Exam' and comparing these results with the performance of the ophthalmology residents.

METHODS

The 75 exam questions, across nine sections and three difficulty levels, were presented to ChatGPT. The responses and explanations were recorded. The readability and complexity of the explanations were analyzed and The Flesch Reading Ease (FRE) score (0-100) was recorded using the program named Readable. Residents were categorized into four groups based on their seniority. The overall and seniority-specific success rates of the residents were compared separately with ChatGPT.

RESULTS

Out of 69 questions, ChatGPT answered 37 correctly (53.62%). The highest success was in Lens and Cataract (77.77%), and the lowest in Pediatric Ophthalmology and Strabismus (0.00%). Of 789 residents, overall accuracy was 50.37%. Seniority-specific accuracy rates were 43.49%, 51.30%, 54.91%, and 60.05% for 1st to 4th-year residents. ChatGPT ranked 292nd among residents. Difficulty-wise, 11 questions were easy, 44 moderate, and 14 difficult. ChatGPT's accuracy for each level was 63.63%, 54.54%, and 42.85%, respectively. The average FRE score of responses generated by ChatGPT was found to be 27.56 ± 12.40.

CONCLUSION

ChatGPT correctly answered 53.6% of questions in an exam for residents. ChatGPT has a lower success rate on average than a 3rd year resident. The readability of responses provided by ChatGPT is low, and they are difficult to understand. As difficulty increases, ChatGPT's success decreases. Predictably, these results will change with more information loaded into ChatGPT.

摘要

目的

本横断面研究旨在评估 ChatGPT 在回答“住院医师培训发展考试”问题时的成功率,并将这些结果与眼科住院医师的表现进行比较。

方法

将 75 个考试问题分为九个部分和三个难度级别,提供给 ChatGPT。记录回答和解释。分析解释的可读性和复杂性,并使用名为 Readable 的程序记录弗莱什阅读舒适度(FRE)得分(0-100)。根据资历将住院医师分为四组。分别比较住院医师与 ChatGPT 的总体和资历特定成功率。

结果

在 69 个问题中,ChatGPT 正确回答了 37 个(53.62%)。成功率最高的是晶状体和白内障(77.77%),最低的是小儿眼科和斜视(0.00%)。在 789 名住院医师中,总体准确率为 50.37%。1 至 4 年级住院医师的准确率分别为 43.49%、51.30%、54.91%和 60.05%。ChatGPT 在住院医师中排名第 292 位。从难度级别来看,有 11 个问题是简单的,44 个是中等的,14 个是困难的。ChatGPT 对每个级别的准确率分别为 63.63%、54.54%和 42.85%。ChatGPT 生成的回答的平均 FRE 得分为 27.56±12.40。

结论

ChatGPT 在住院医师考试中正确回答了 53.6%的问题。ChatGPT 的平均成功率低于三年级住院医师。ChatGPT 提供的回答的可读性较低,难以理解。随着难度的增加,ChatGPT 的成功率下降。可以预见,随着更多信息加载到 ChatGPT 中,这些结果将会发生变化。

相似文献

1
Performance of Chatgpt in ophthalmology exam; human versus AI.Chatgpt 在眼科考试中的表现;人类与 AI 相比。
Int Ophthalmol. 2024 Nov 6;44(1):413. doi: 10.1007/s10792-024-03353-w.
2
Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.Gemini人工智能与ChatGPT对比:与眼科住院医师一起对医学知识进行的全面考察
Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15.
3
Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.评估问题特征对 ChatGPT 表现和回应解释一致性的影响:来自台湾护理执照考试的见解。
Int J Nurs Stud. 2024 May;153:104717. doi: 10.1016/j.ijnurstu.2024.104717. Epub 2024 Feb 8.
4
Evaluating the application of ChatGPT in China's residency training education: An exploratory study.评估ChatGPT在中国住院医师规范化培训教育中的应用:一项探索性研究。
Med Teach. 2025 May;47(5):858-864. doi: 10.1080/0142159X.2024.2377808. Epub 2024 Jul 12.
5
ChatGPT, Bard, and Bing Chat Are Large Language Processing Models That Answered Orthopaedic In-Training Examination Questions With Similar Accuracy to First-Year Orthopaedic Surgery Residents.ChatGPT、Bard和必应聊天是大型语言处理模型,它们回答骨科住院医师培训考试问题的准确率与骨科外科一年级住院医师相似。
Arthroscopy. 2025 Mar;41(3):557-562. doi: 10.1016/j.arthro.2024.08.023. Epub 2024 Aug 28.
6
Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE).人工智能在骨科领域的应用:ChatGPT 在 AAOS 骨科住院医师培训考试(OITE)全题文本和图像问题上的表现。
J Surg Educ. 2024 Nov;81(11):1645-1649. doi: 10.1016/j.jsurg.2024.08.002. Epub 2024 Sep 14.
7
Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma Examination.人工智能与人类认知:ChatGPT与参加欧洲眼科委员会文凭考试的考生的对比分析
Vision (Basel). 2025 Apr 9;9(2):31. doi: 10.3390/vision9020031.
8
Performance of an Artificial Intelligence Chatbot in Ophthalmic Knowledge Assessment.人工智能聊天机器人在眼科知识评估中的表现。
JAMA Ophthalmol. 2023 Jun 1;141(6):589-597. doi: 10.1001/jamaophthalmol.2023.1144.
9
Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study.探讨 ChatGPT 版本 3.5、4 和 4 与 Vision 在智利医师执照考试中的表现:观察性研究。
JMIR Med Educ. 2024 Apr 29;10:e55048. doi: 10.2196/55048.
10
Performance of three artificial intelligence (AI)-based large language models in standardized testing; implications for AI-assisted dental education.三种基于人工智能(AI)的大语言模型在标准化测试中的表现;对人工智能辅助牙科教育的启示。
J Periodontal Res. 2025 Feb;60(2):121-133. doi: 10.1111/jre.13323. Epub 2024 Jul 18.

引用本文的文献

1
Evaluating ChatGPT-4 Plus in Ophthalmology: Effect of Image Recognition and Domain-Specific Pretraining on Diagnostic Performance.评估眼科领域的ChatGPT-4 Plus:图像识别和特定领域预训练对诊断性能的影响。
Diagnostics (Basel). 2025 Jul 19;15(14):1820. doi: 10.3390/diagnostics15141820.
2
Large language models in the management of chronic ocular diseases: a scoping review.大语言模型在慢性眼病管理中的应用:一项范围综述
Front Cell Dev Biol. 2025 Jun 18;13:1608988. doi: 10.3389/fcell.2025.1608988. eCollection 2025.
3
Evaluating the Accuracy of Gemini 2.0 Advanced and ChatGPT 4o in Cataract Knowledge: A Performance Analysis Using Brazilian Council of Ophthalmology Board Exam Questions.

本文引用的文献

1
Information Quality and Readability: ChatGPT's Responses to the Most Common Questions About Spinal Cord Injury.信息质量与可读性:ChatGPT 对脊髓损伤常见问题的回答
World Neurosurg. 2024 Jan;181:e1138-e1144. doi: 10.1016/j.wneu.2023.11.062. Epub 2023 Nov 22.
2
Acute dacryocystitis: changing practice pattern over the last three decades at a tertiary care setup.急性泪囊炎:在过去三十年,一家三级医疗中心的实践模式的改变。
Graefes Arch Clin Exp Ophthalmol. 2024 Apr;262(4):1289-1293. doi: 10.1007/s00417-023-06300-0. Epub 2023 Nov 4.
3
Auxiliary use of ChatGPT in surgical diagnosis and treatment.
评估Gemini 2.0 Advanced和ChatGPT 4o在白内障知识方面的准确性:使用巴西眼科理事会委员会考试问题进行的性能分析
Cureus. 2025 Feb 24;17(2):e79565. doi: 10.7759/cureus.79565. eCollection 2025 Feb.
4
Assessing the performance of large language models (GPT-3.5 and GPT-4) and accurate clinical information for pediatric nephrology.评估大型语言模型(GPT-3.5和GPT-4)的性能以及儿科肾脏病学的准确临床信息。
Pediatr Nephrol. 2025 Mar 5. doi: 10.1007/s00467-025-06723-3.
ChatGPT 在手术诊断和治疗中的辅助应用。
Int J Surg. 2023 Dec 1;109(12):3940-3943. doi: 10.1097/JS9.0000000000000686.
4
Artificial Intelligence in Ophthalmology: A Comparative Analysis of GPT-3.5, GPT-4, and Human Expertise in Answering StatPearls Questions.眼科中的人工智能:GPT-3.5、GPT-4与人类专家回答StatPearls问题的比较分析
Cureus. 2023 Jun 22;15(6):e40822. doi: 10.7759/cureus.40822. eCollection 2023 Jun.
5
The Pros and Cons of Using ChatGPT in Medical Education: A Scoping Review.使用 ChatGPT 在医学教育中的利弊:范围综述。
Stud Health Technol Inform. 2023 Jun 29;305:644-647. doi: 10.3233/SHTI230580.
6
Performance of Generative Large Language Models on Ophthalmology Board-Style Questions.生成式大型语言模型在眼科 Board 式问题中的表现。
Am J Ophthalmol. 2023 Oct;254:141-149. doi: 10.1016/j.ajo.2023.05.024. Epub 2023 Jun 18.
7
Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of Its Successes and Shortcomings.评估ChatGPT在眼科领域的表现:对其优缺点的分析。
Ophthalmol Sci. 2023 May 5;3(4):100324. doi: 10.1016/j.xops.2023.100324. eCollection 2023 Dec.
8
Appropriateness and Readability of ChatGPT-4-Generated Responses for Surgical Treatment of Retinal Diseases.ChatGPT-4 生成的回复在视网膜疾病手术治疗中的适宜性和可读性。
Ophthalmol Retina. 2023 Oct;7(10):862-868. doi: 10.1016/j.oret.2023.05.022. Epub 2023 Jun 3.
9
Large language model (ChatGPT) as a support tool for breast tumor board.大语言模型(ChatGPT)作为乳腺肿瘤多学科诊疗团队的辅助工具。
NPJ Breast Cancer. 2023 May 30;9(1):44. doi: 10.1038/s41523-023-00557-8.
10
Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.人工智能能通过美国骨科医师学会考试吗?骨科住院医师与ChatGPT的对比。
Clin Orthop Relat Res. 2023 Aug 1;481(8):1623-1630. doi: 10.1097/CORR.0000000000002704. Epub 2023 May 23.