• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ChatGPT在验光与视觉科学考试问题上的表现。

Performance of ChatGPT on optometry and vision science exam questions.

作者信息

Yoshioka Nayuta, Honson Vanessa, Mani Revathy, Oberstein Sharon, Watt Kathleen, Maseedupally Vinod

机构信息

School of Optometry and Vision Science, UNSW Australia, Sydney, New South Wales, Australia.

出版信息

Ophthalmic Physiol Opt. 2025 Sep;45(6):1376-1388. doi: 10.1111/opo.13544. Epub 2025 Jul 9.

DOI:10.1111/opo.13544
PMID:40631633
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12357226/
Abstract

The rapid proliferation of Large Language Models (LLM) tools, such as ChatGPT developed by OpenAI, presents both a challenge and an opportunity for educators. While LLMs can generate convincing written responses across a wide range of academic fields, their capabilities vary noticeably across different models, fields and even sub-fields. This paper aims to evaluate the capabilities of LLMs in the field of optometry and vision science by analysing the quality of the responses generated by ChatGPT using sample long answer questions covering different sub-fields of optometry, namely binocular vision, clinical communication, dispensing and ocular pathology. It also seeks to explore the possibility of LLMs being used as virtual graders. The capabilities of ChatGPT were explored utilising various GPT models (GPT-3.5, GPT-4 and o1 models, from oldest to newest) by investigating the concordance between ChatGPT and a human grader. This was followed by benchmarking the performance of these GPT models to various sample questions in optometry and vision science. Statistical analyses include mixed-effect analysis and the Friedman test, Wilcoxon signed-rank test and thematic analysis. ChatGPT graders awarded higher marks compared to human graders, but significant only for GPT-3.5 (p < 0.05). Benchmarking on sample questions demonstrated that all GPT models can generate satisfactory responses above the 50% 'pass' score in many cases (p < 0.05), albeit with the performance varying significantly across different sub-fields (p < 0.0001) and models (p = 0.0003). Newer models significantly outperformed older models in most cases. The frequency of thematic response errors was more mixed between GPT-3.5 and GPT-4 models (p < 0.05 to p > 0.99), while o1 made no thematic errors. These findings indicate ChatGPT may impact learning and teaching practices in this field. The inconsistent performances across sub-fields and additional implementation considerations, such as ethics and transparency, support a judicious adaptation of assessment practice and adoption of the technology in optometry and vision science education.

摘要

诸如OpenAI开发的ChatGPT之类的大语言模型(LLM)工具的迅速普及,对教育工作者来说既是挑战也是机遇。虽然大语言模型能够在广泛的学术领域生成令人信服的书面回答,但其能力在不同模型、领域甚至子领域之间存在显著差异。本文旨在通过分析ChatGPT针对涵盖验光不同子领域(即双眼视觉、临床沟通、配镜和眼部病理学)的示例长答题所生成回答的质量,来评估大语言模型在验光与视觉科学领域的能力。本文还旨在探索大语言模型用作虚拟评分者的可能性。通过研究ChatGPT与人工评分者之间的一致性,利用各种GPT模型(从最旧到最新的GPT-3.5、GPT-4和o1模型)探索了ChatGPT的能力。随后,将这些GPT模型针对验光与视觉科学中的各种示例问题的表现进行了基准测试。统计分析包括混合效应分析、弗里德曼检验、威尔科克森符号秩检验和主题分析。与人工评分者相比,ChatGPT评分者给出的分数更高,但仅GPT-3.5的情况具有显著性(p < 0.05)。对示例问题的基准测试表明,在许多情况下,所有GPT模型都能生成令人满意的、高于50%“及格”分数的回答(p < 0.05),尽管其表现因不同子领域(p < 0.0001)和模型(p = 0.0003)而有显著差异。在大多数情况下,较新的模型明显优于较旧的模型。GPT-3.5和GPT-4模型之间主题回答错误的频率更为复杂(p < 0.05至p > 0.99),而o1没有主题错误。这些发现表明ChatGPT可能会影响该领域的学习和教学实践。各子领域表现的不一致以及诸如伦理和透明度等其他实施方面的考虑,支持在验光与视觉科学教育中审慎调整评估实践并采用该技术。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7777/12357226/03bd223431a3/OPO-45-1376-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7777/12357226/5b3922cbe4fd/OPO-45-1376-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7777/12357226/544db0e038b9/OPO-45-1376-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7777/12357226/8a9e9ea9b618/OPO-45-1376-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7777/12357226/a63fb0d641c8/OPO-45-1376-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7777/12357226/13326f30581f/OPO-45-1376-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7777/12357226/03bd223431a3/OPO-45-1376-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7777/12357226/5b3922cbe4fd/OPO-45-1376-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7777/12357226/544db0e038b9/OPO-45-1376-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7777/12357226/8a9e9ea9b618/OPO-45-1376-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7777/12357226/a63fb0d641c8/OPO-45-1376-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7777/12357226/13326f30581f/OPO-45-1376-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7777/12357226/03bd223431a3/OPO-45-1376-g005.jpg

相似文献

1
Performance of ChatGPT on optometry and vision science exam questions.ChatGPT在验光与视觉科学考试问题上的表现。
Ophthalmic Physiol Opt. 2025 Sep;45(6):1376-1388. doi: 10.1111/opo.13544. Epub 2025 Jul 9.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
The performance of ChatGPT on medical image-based assessments and implications for medical education.ChatGPT在基于医学图像的评估中的表现及其对医学教育的影响。
BMC Med Educ. 2025 Aug 23;25(1):1192. doi: 10.1186/s12909-025-07752-0.
4
Large language models (LLMs) in radiology exams for medical students: Performance and consequences.面向医学生的放射学考试中的大语言模型:表现与影响。
Rofo. 2024 Nov 4. doi: 10.1055/a-2437-2067.
5
Stench of Errors or the Shine of Potential: The Challenge of (Ir)Responsible Use of ChatGPT in Speech-Language Pathology.错误的恶臭还是潜力的光辉:言语病理学中(不)负责任地使用ChatGPT的挑战。
Int J Lang Commun Disord. 2025 Jul-Aug;60(4):e70088. doi: 10.1111/1460-6984.70088.
6
Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study.在医学视觉问答中评估Bard Gemini Pro和GPT-4 Vision对学生表现的影响:比较案例研究
JMIR Form Res. 2024 Dec 17;8:e57592. doi: 10.2196/57592.
7
Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study.揭示GPT-4V在美国医师执照考试(USMLE)问题上高精度背后的隐藏挑战:观察性研究。
J Med Internet Res. 2025 Feb 7;27:e65146. doi: 10.2196/65146.
8
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现:系统评价和荟萃分析。
J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.
9
Triage Performance Across Large Language Models, ChatGPT, and Untrained Doctors in Emergency Medicine: Comparative Study.分诊表现比较:大型语言模型、ChatGPT 和未经训练的急诊医生:一项对比研究。
J Med Internet Res. 2024 Jun 14;26:e53297. doi: 10.2196/53297.
10
Comparative Analysis of LLMs' Performance On a Practice Radiography Certification Exam.大语言模型在放射实践认证考试中的性能比较分析
Radiol Technol. 2025 May-Jun;96(5):334-342.

本文引用的文献

1
Towards accurate differential diagnosis with large language models.迈向使用大语言模型进行准确的鉴别诊断。
Nature. 2025 Apr 9. doi: 10.1038/s41586-025-08869-4.
2
Exploring Capabilities of Large Language Models such as ChatGPT in Radiation Oncology.探索诸如ChatGPT等大语言模型在放射肿瘤学中的能力。
Adv Radiat Oncol. 2023 Nov 4;9(3):101400. doi: 10.1016/j.adro.2023.101400. eCollection 2024 Mar.
3
Leveraging ChatGPT for ophthalmic education: A critical appraisal.利用 ChatGPT 进行眼科教育:批判性评价。
Eur J Ophthalmol. 2024 Mar;34(2):323-327. doi: 10.1177/11206721231215862. Epub 2023 Nov 16.
4
Popular large language model chatbots' accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries.流行的大语言模型聊天机器人在回答眼部症状查询时的准确性、全面性和自我意识。
iScience. 2023 Oct 10;26(11):108163. doi: 10.1016/j.isci.2023.108163. eCollection 2023 Nov 17.
5
Macular neovascularization and polypoidal choroidal vasculopathy: phenotypic variations, pathogenic mechanisms and implications in management.黄斑新生血管和息肉样脉络膜血管病变:表型变异、发病机制及其在治疗中的意义。
Eye (Lond). 2024 Mar;38(4):659-667. doi: 10.1038/s41433-023-02764-w. Epub 2023 Oct 6.
6
Will artificial intelligence render optometrists redundant?人工智能会使验光师变得多余吗?
Clin Exp Optom. 2023 Aug;106(6):567-568. doi: 10.1080/08164622.2023.2216378.
7
Artificial Intelligence in Ophthalmology: A Comparative Analysis of GPT-3.5, GPT-4, and Human Expertise in Answering StatPearls Questions.眼科中的人工智能:GPT-3.5、GPT-4与人类专家回答StatPearls问题的比较分析
Cureus. 2023 Jun 22;15(6):e40822. doi: 10.7759/cureus.40822. eCollection 2023 Jun.
8
Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of Its Successes and Shortcomings.评估ChatGPT在眼科领域的表现:对其优缺点的分析。
Ophthalmol Sci. 2023 May 5;3(4):100324. doi: 10.1016/j.xops.2023.100324. eCollection 2023 Dec.
9
Performance of ChatGPT, GPT-4, and Google Bard on a Neurosurgery Oral Boards Preparation Question Bank.ChatGPT、GPT-4和谷歌巴德在神经外科口试准备题库上的表现。
Neurosurgery. 2023 Nov 1;93(5):1090-1098. doi: 10.1227/neu.0000000000002551. Epub 2023 Jun 12.
10
ChatGPT Answers Common Patient Questions About Colonoscopy.ChatGPT回答患者关于结肠镜检查的常见问题。
Gastroenterology. 2023 Aug;165(2):509-511.e7. doi: 10.1053/j.gastro.2023.04.033. Epub 2023 May 5.