Suppr超能文献

单最佳答案(SBA)题型中的答题模式:学生、GPT3.5和Gemini。

Answering Patterns in SBA Items: Students, GPT3.5, and Gemini.

作者信息

Ng Olivia, Phua Dong Haur, Chu Jowe, Wilding Lucy V E, Mogali Sreenivasulu Reddy, Cleland Jennifer

机构信息

Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.

Emergency Department, Tan Tock Seng Hospital, Singapore, Singapore.

出版信息

Med Sci Educ. 2024 Nov 26;35(2):629-632. doi: 10.1007/s40670-024-02232-4. eCollection 2025 Apr.

Abstract

While large language models (LLMs) are often used to generate and answer exam questions, limited work compares their performance across multiple iterations using item statistics. This study aims to fill that gap by investigating answering patterns of how LLMs respond to single-best answer (SBA) questions, comparing their performance to that of students. Forty-one SBA questions for first-year medical students were assessed using the most easily assessable and free-to-use GPT3.5 and Gemini across 100 iterations. Both LLMs exhibited more repetitive and clustered answering patterns compared to students, which can be problematic as it may compound mistakes by repeating error selection. Distractor analysis revealed that students performed better when managing multiple options in the SBA format. We found that these free-to-use LLMs are inferior to well-trained students or specialists in handling technical questions. We have also highlighted concerns on LLMs' contextual interpretation of these items and the need of human oversight in the medical education assessment process.

摘要

虽然大语言模型(LLMs)常被用于生成和回答考试问题,但利用题目统计数据对其在多个迭代中的表现进行比较的研究却很有限。本研究旨在通过调查大语言模型对单项最佳答案(SBA)问题的回答模式来填补这一空白,并将其表现与学生的表现进行比较。使用最易于评估且免费使用的GPT3.5和Gemini,对面向一年级医学生的41道SBA问题进行了100次迭代评估。与学生相比,这两种大语言模型都表现出更多重复和集中的回答模式,这可能会有问题,因为重复错误选择可能会使错误加剧。干扰项分析表明,学生在处理SBA格式的多个选项时表现更好。我们发现,这些免费使用的大语言模型在处理技术问题方面不如训练有素的学生或专家。我们还强调了对大语言模型对这些题目的情境解释的担忧,以及医学教育评估过程中人工监督的必要性。

相似文献

1
Answering Patterns in SBA Items: Students, GPT3.5, and Gemini.单最佳答案(SBA)题型中的答题模式:学生、GPT3.5和Gemini。
Med Sci Educ. 2024 Nov 26;35(2):629-632. doi: 10.1007/s40670-024-02232-4. eCollection 2025 Apr.
10
Audit and feedback: effects on professional practice.审核与反馈:对专业实践的影响
Cochrane Database Syst Rev. 2025 Mar 25;3(3):CD000259. doi: 10.1002/14651858.CD000259.pub4.

本文引用的文献

4
ChatGPT for assessment writing.ChatGPT 用于评估写作。
Med Teach. 2023 Nov;45(11):1224-1227. doi: 10.1080/0142159X.2023.2249239. Epub 2023 Oct 16.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验