• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估ChatGPT 3.5和Bard作为本科眼科简答题自我评估工具的使用情况。

Evaluating the Use of ChatGPT 3.5 and Bard as Self-Assessment Tools for Short Answer Questions in Undergraduate Ophthalmology.

作者信息

Khake Abhijeet M, Gokhale Suvarna, Dindore Pradeep, Khake Sonali, Desai Manjiri

机构信息

Department of Ophthalmology, Pacific Medical College and Hospital, Udaipur, IND.

Department of Ophthalmology, Smt. Kashibai Navale Medical College and General Hospital, Pune, IND.

出版信息

Cureus. 2025 Jun 18;17(6):e86288. doi: 10.7759/cureus.86288. eCollection 2025 Jun.

DOI:10.7759/cureus.86288
PMID:40688974
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12273740/
Abstract

OBJECTIVE

This study aimed to evaluate the efficacy of ChatGPT 3.5 and Google Bard as tools for self-assessment of short answer questions (SAQs) in ophthalmology for undergraduate medical students.

METHODOLOGY

A total of 261 SAQs were randomly selected from previous university examination papers and publicly available ophthalmology question banks. The questions were classified according to the competency-based medical education (CBME) curriculum of the National Medical Commission (NMC) of India into three categories: short note task-oriented questions (SNTO, n = 169), short note reasoning questions (SNRQ, n = 15), and applied aspect SAQs (SN Applied, n = 77). Image-based questions were excluded. Three ophthalmologists collaboratively developed model answers for each question. The same questions were then submitted to ChatGPT 3.5 and Google Bard. The AI-generated responses were independently evaluated by three ophthalmologists using a 3-point scale based on correct diagnosis, accuracy of content, and relevance. The scores were compiled, and the data were analyzed to compare the overall and category-wise performance of the two AI tools.

RESULTS

Out of a total possible score of 783 (261 questions × 3 points), ChatGPT 3.5 scored 696 (88.8%), while Bard scored 685 (87.5%). Although the overall performance difference was insignificant, ChatGPT 3.5 performed significantly better in the SNTO category. However, both AI tools produced poor-quality or inadequate answers for a subset of questions: 50 (19%) by ChatGPT 3.5 and 44 (16.8%) by Bard. Some responses lacked essential information, even for high-yield topics.

CONCLUSION

ChatGPT 3.5 and Bard can generate accurate and relevant responses to ophthalmology SAQs in most cases. ChatGPT 3.5 demonstrated slightly better performance, particularly for task-oriented questions, suggesting it may be a more effective tool for undergraduate students' self-assessment. However, due to a notable error rate (~20%), AI-generated responses should not be used in isolation and must be cross-referenced with standard textbooks. These tools best suit rapid information retrieval during the early study phases.

摘要

目的

本研究旨在评估ChatGPT 3.5和谷歌巴德(Google Bard)作为本科医学生眼科简答题(SAQs)自我评估工具的有效性。

方法

从以往的大学考试试卷和公开可用的眼科题库中随机选取261道简答题。这些问题根据印度国家医学委员会(NMC)基于胜任力的医学教育(CBME)课程分为三类:短笔记任务导向型问题(SNTO,n = 169)、短笔记推理问题(SNRQ,n = 15)和应用方面的简答题(SN Applied,n = 77)。基于图像的问题被排除。三位眼科医生共同为每个问题制定了标准答案。然后将相同的问题提交给ChatGPT 3.5和谷歌巴德。三位眼科医生根据正确诊断、内容准确性和相关性,使用3分制对人工智能生成的回答进行独立评估。汇总分数,并对数据进行分析,以比较这两种人工智能工具的整体和按类别表现。

结果

在总分783分(261道题×3分)中,ChatGPT 3.5得分为696分(88.8%),而巴德得分为685分(87.5%)。虽然整体表现差异不显著,但ChatGPT 3.5在SNTO类别中表现明显更好。然而,对于一部分问题,这两种人工智能工具都给出了质量差或不充分的答案:ChatGPT 3.5有50个(19%),巴德有44个(16.8%)。一些回答甚至缺乏关键信息,即使是对于高收益主题。

结论

ChatGPT 3.5和巴德在大多数情况下能够生成准确且相关的眼科简答题答案。ChatGPT 3.5表现略好,尤其是对于任务导向型问题,这表明它可能是本科学生自我评估的更有效工具。然而,由于错误率较高(约20%),不应单独使用人工智能生成的回答,必须与标准教科书进行交叉参考。这些工具最适合在早期学习阶段进行快速信息检索。

相似文献

1
Evaluating the Use of ChatGPT 3.5 and Bard as Self-Assessment Tools for Short Answer Questions in Undergraduate Ophthalmology.评估ChatGPT 3.5和Bard作为本科眼科简答题自我评估工具的使用情况。
Cureus. 2025 Jun 18;17(6):e86288. doi: 10.7759/cureus.86288. eCollection 2025 Jun.
2
Artificial intelligence in radiology examinations: a psychometric comparison of question generation methods.放射学检查中的人工智能:问题生成方法的心理测量学比较
Diagn Interv Radiol. 2025 Jul 21. doi: 10.4274/dir.2025.253407.
3
Is Information About Musculoskeletal Malignancies From Large Language Models or Web Resources at a Suitable Reading Level for Patients?来自大语言模型或网络资源的关于肌肉骨骼恶性肿瘤的信息对患者来说是否处于合适的阅读水平?
Clin Orthop Relat Res. 2025 Feb 1;483(2):306-315. doi: 10.1097/CORR.0000000000003263. Epub 2024 Sep 25.
4
Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.ChatGPT与互联网搜索用于职业医学临床研究和决策的比较:随机对照试验
JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.
5
Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study.在医学视觉问答中评估Bard Gemini Pro和GPT-4 Vision对学生表现的影响:比较案例研究
JMIR Form Res. 2024 Dec 17;8:e57592. doi: 10.2196/57592.
6
Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.外周动脉疾病教育中的人工智能:ChatGPT与谷歌Gemini的较量
Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.
7
The educational effects of portfolios on undergraduate student learning: a Best Evidence Medical Education (BEME) systematic review. BEME Guide No. 11.档案袋对本科学生学习的教育效果:最佳证据医学教育(BEME)系统评价。BEME指南第11号。
Med Teach. 2009 Apr;31(4):282-98. doi: 10.1080/01421590902889897.
8
Large Language Models and Empathy: Systematic Review.大语言模型与同理心:系统综述
J Med Internet Res. 2024 Dec 11;26:e52597. doi: 10.2196/52597.
9
Can generative artificial intelligence pass the orthopaedic board examination?生成式人工智能能通过骨科医师资格考试吗?
J Orthop. 2023 Nov 5;53:27-33. doi: 10.1016/j.jor.2023.10.026. eCollection 2024 Jul.
10
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

本文引用的文献

1
Performance Comparison of Junior Residents and ChatGPT in the Objective Structured Clinical Examination (OSCE) for Medical History Taking and Documentation of Medical Records: Development and Usability Study.初级住院医师与ChatGPT在病史采集和病历记录客观结构化临床考试(OSCE)中的表现比较:开发与可用性研究
JMIR Med Educ. 2024 Nov 21;10:e59902. doi: 10.2196/59902.
2
Appraisal of ChatGPT's Aptitude for Medical Education: Comparative Analysis With Third-Year Medical Students in a Pulmonology Examination.评估 ChatGPT 在医学教育中的能力:与三年级医学生在肺病学考试中的比较分析。
JMIR Med Educ. 2024 Jul 23;10:e52818. doi: 10.2196/52818.
3
ChatGPT-generated help produces learning gains equivalent to human tutor-authored help on mathematics skills.ChatGPT 生成的帮助产生的学习收益等同于人类导师编写的数学技能帮助。
PLoS One. 2024 May 24;19(5):e0304013. doi: 10.1371/journal.pone.0304013. eCollection 2024.
4
Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study.ChatGPT 在不同考试级别的眼科相关问题上的表现:观察性研究。
JMIR Med Educ. 2024 Jan 18;10:e50842. doi: 10.2196/50842.
5
Evaluating ChatGPT as a self-learning tool in medical biochemistry: A performance assessment in undergraduate medical university examination.评估ChatGPT作为医学生物化学自学工具的效果:一项本科医科大学考试中的性能评估。
Biochem Mol Biol Educ. 2024 Mar-Apr;52(2):237-248. doi: 10.1002/bmb.21808. Epub 2023 Dec 19.
6
Examining the Threat of ChatGPT to the Validity of Short Answer Assessments in an Undergraduate Medical Program.审视ChatGPT对本科医学课程中简答题评估有效性的威胁。
J Med Educ Curric Dev. 2023 Sep 28;10:23821205231204178. doi: 10.1177/23821205231204178. eCollection 2023 Jan-Dec.
7
Analysing the Applicability of ChatGPT, Bard, and Bing to Generate Reasoning-Based Multiple-Choice Questions in Medical Physiology.分析ChatGPT、Bard和必应在医学生理学中生成基于推理的多项选择题的适用性。
Cureus. 2023 Jun 26;15(6):e40977. doi: 10.7759/cureus.40977. eCollection 2023 Jun.
8
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
9
Evaluating ChatGPT's Ability to Solve Higher-Order Questions on the Competency-Based Medical Education Curriculum in Medical Biochemistry.评估ChatGPT解决医学基础生物化学基于能力的医学教育课程中高阶问题的能力。
Cureus. 2023 Apr 2;15(4):e37023. doi: 10.7759/cureus.37023. eCollection 2023 Apr.
10
Assessing the Capability of ChatGPT in Answering First- and Second-Order Knowledge Questions on Microbiology as per Competency-Based Medical Education Curriculum.根据基于能力的医学教育课程评估ChatGPT回答微生物学一阶和二阶知识问题的能力。
Cureus. 2023 Mar 12;15(3):e36034. doi: 10.7759/cureus.36034. eCollection 2023 Mar.