Suppr超能文献

ChatGPT能否生成与外科医生编写的相媲美的外科选择题?

Can ChatGPT generate surgical multiple-choice questions comparable to those written by a surgeon?

作者信息

Kıyak Yavuz Selim, Coşkun Ali Kağan, Kaymak Şahin, Coşkun Özlem, Budakoğlu Işıl İrem

机构信息

Department of Medical Education and Informatics, Gazi University Faculty of Medicine, Ankara, Turkey.

Department of General Surgery, UHS Gulhane School of Medicine, Ankara, Turkey.

出版信息

Proc (Bayl Univ Med Cent). 2024 Oct 22;38(1):48-52. doi: 10.1080/08998280.2024.2418752. eCollection 2025.

Abstract

BACKGROUND

This study aimed to determine whether surgical multiple-choice questions generated by ChatGPT are comparable to those written by human experts (surgeons).

METHODS

The study was conducted at a medical school and involved 112 fourth-year medical students. Based on five learning objectives in general surgery (colorectal, gastric, trauma, breast, thyroid), ChatGPT and surgeons generated five multiple-choice questions. No change was made to the ChatGPT-generated questions. The statistical properties of these questions, including correlations between two group of questions and correlations with total scores (item discrimination) in a general surgery clerkship exam, were reported.

RESULTS

There was a significant positive correlation between the ChatGPT-generated and human-written questions for one learning objective (colorectal). More importantly, only one ChatGPT-generated question (colorectal) achieved an acceptable discrimination level, while other four failed to achieve it. In contrast, human-written questions showed acceptable discrimination levels.

CONCLUSION

While ChatGPT has the potential to generate multiple-choice questions comparable to human-written ones in specific contexts, the variability across surgical topics points to the need for human oversight and review before their use in exams. It is important to integrate artificial intelligence tools like ChatGPT with human expertise to enhance efficiency and quality.

摘要

背景

本研究旨在确定由ChatGPT生成的外科多项选择题是否与人类专家(外科医生)编写的题目相当。

方法

该研究在一所医学院进行,涉及112名四年级医学生。基于普通外科的五个学习目标(结直肠、胃、创伤、乳腺、甲状腺),ChatGPT和外科医生分别生成了五道多项选择题。ChatGPT生成的题目未作修改。报告了这些题目的统计特性,包括两组题目之间的相关性以及与普通外科实习考试总成绩的相关性(题目区分度)。

结果

对于一个学习目标(结直肠),ChatGPT生成的题目与人类编写的题目之间存在显著正相关。更重要的是,ChatGPT生成的题目中只有一道(结直肠)达到了可接受的区分度水平,而其他四道未达到。相比之下,人类编写的题目显示出可接受的区分度水平。

结论

虽然ChatGPT有潜力在特定背景下生成与人类编写的题目相当的多项选择题,但外科主题之间的差异表明在将其用于考试之前需要人工监督和审查。将ChatGPT等人工智能工具与人类专业知识相结合以提高效率和质量很重要。

相似文献

本文引用的文献

8
Response to: "ChatGPT for assessment writing".对《用于评估写作的ChatGPT》的回应
Med Teach. 2024 Jun;46(6):857-858. doi: 10.1080/0142159X.2024.2311269. Epub 2024 Feb 2.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验