Suppr超能文献

评估ChatGPT 3.5和Bard作为本科眼科简答题自我评估工具的使用情况。

Evaluating the Use of ChatGPT 3.5 and Bard as Self-Assessment Tools for Short Answer Questions in Undergraduate Ophthalmology.

作者信息

Khake Abhijeet M, Gokhale Suvarna, Dindore Pradeep, Khake Sonali, Desai Manjiri

机构信息

Department of Ophthalmology, Pacific Medical College and Hospital, Udaipur, IND.

Department of Ophthalmology, Smt. Kashibai Navale Medical College and General Hospital, Pune, IND.

出版信息

Cureus. 2025 Jun 18;17(6):e86288. doi: 10.7759/cureus.86288. eCollection 2025 Jun.

Abstract

OBJECTIVE

This study aimed to evaluate the efficacy of ChatGPT 3.5 and Google Bard as tools for self-assessment of short answer questions (SAQs) in ophthalmology for undergraduate medical students.

METHODOLOGY

A total of 261 SAQs were randomly selected from previous university examination papers and publicly available ophthalmology question banks. The questions were classified according to the competency-based medical education (CBME) curriculum of the National Medical Commission (NMC) of India into three categories: short note task-oriented questions (SNTO, n = 169), short note reasoning questions (SNRQ, n = 15), and applied aspect SAQs (SN Applied, n = 77). Image-based questions were excluded. Three ophthalmologists collaboratively developed model answers for each question. The same questions were then submitted to ChatGPT 3.5 and Google Bard. The AI-generated responses were independently evaluated by three ophthalmologists using a 3-point scale based on correct diagnosis, accuracy of content, and relevance. The scores were compiled, and the data were analyzed to compare the overall and category-wise performance of the two AI tools.

RESULTS

Out of a total possible score of 783 (261 questions × 3 points), ChatGPT 3.5 scored 696 (88.8%), while Bard scored 685 (87.5%). Although the overall performance difference was insignificant, ChatGPT 3.5 performed significantly better in the SNTO category. However, both AI tools produced poor-quality or inadequate answers for a subset of questions: 50 (19%) by ChatGPT 3.5 and 44 (16.8%) by Bard. Some responses lacked essential information, even for high-yield topics.

CONCLUSION

ChatGPT 3.5 and Bard can generate accurate and relevant responses to ophthalmology SAQs in most cases. ChatGPT 3.5 demonstrated slightly better performance, particularly for task-oriented questions, suggesting it may be a more effective tool for undergraduate students' self-assessment. However, due to a notable error rate (~20%), AI-generated responses should not be used in isolation and must be cross-referenced with standard textbooks. These tools best suit rapid information retrieval during the early study phases.

摘要

目的

本研究旨在评估ChatGPT 3.5和谷歌巴德(Google Bard)作为本科医学生眼科简答题(SAQs)自我评估工具的有效性。

方法

从以往的大学考试试卷和公开可用的眼科题库中随机选取261道简答题。这些问题根据印度国家医学委员会(NMC)基于胜任力的医学教育(CBME)课程分为三类:短笔记任务导向型问题(SNTO,n = 169)、短笔记推理问题(SNRQ,n = 15)和应用方面的简答题(SN Applied,n = 77)。基于图像的问题被排除。三位眼科医生共同为每个问题制定了标准答案。然后将相同的问题提交给ChatGPT 3.5和谷歌巴德。三位眼科医生根据正确诊断、内容准确性和相关性,使用3分制对人工智能生成的回答进行独立评估。汇总分数,并对数据进行分析,以比较这两种人工智能工具的整体和按类别表现。

结果

在总分783分(261道题×3分)中,ChatGPT 3.5得分为696分(88.8%),而巴德得分为685分(87.5%)。虽然整体表现差异不显著,但ChatGPT 3.5在SNTO类别中表现明显更好。然而,对于一部分问题,这两种人工智能工具都给出了质量差或不充分的答案:ChatGPT 3.5有50个(19%),巴德有44个(16.8%)。一些回答甚至缺乏关键信息,即使是对于高收益主题。

结论

ChatGPT 3.5和巴德在大多数情况下能够生成准确且相关的眼科简答题答案。ChatGPT 3.5表现略好,尤其是对于任务导向型问题,这表明它可能是本科学生自我评估的更有效工具。然而,由于错误率较高(约20%),不应单独使用人工智能生成的回答,必须与标准教科书进行交叉参考。这些工具最适合在早期学习阶段进行快速信息检索。

相似文献

本文引用的文献

8
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验