ChatGPT能否生成与外科医生编写的相媲美的外科选择题？

Can ChatGPT generate surgical multiple-choice questions comparable to those written by a surgeon?

作者信息

Kıyak Yavuz Selim, Coşkun Ali Kağan, Kaymak Şahin, Coşkun Özlem, Budakoğlu Işıl İrem

机构信息

Department of Medical Education and Informatics, Gazi University Faculty of Medicine, Ankara, Turkey.

Department of General Surgery, UHS Gulhane School of Medicine, Ankara, Turkey.

出版信息

Proc (Bayl Univ Med Cent). 2024 Oct 22;38(1):48-52. doi: 10.1080/08998280.2024.2418752. eCollection 2025.

DOI:10.1080/08998280.2024.2418752

PMID:39712415

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11657069/

Abstract

BACKGROUND

This study aimed to determine whether surgical multiple-choice questions generated by ChatGPT are comparable to those written by human experts (surgeons).

METHODS

The study was conducted at a medical school and involved 112 fourth-year medical students. Based on five learning objectives in general surgery (colorectal, gastric, trauma, breast, thyroid), ChatGPT and surgeons generated five multiple-choice questions. No change was made to the ChatGPT-generated questions. The statistical properties of these questions, including correlations between two group of questions and correlations with total scores (item discrimination) in a general surgery clerkship exam, were reported.

RESULTS

There was a significant positive correlation between the ChatGPT-generated and human-written questions for one learning objective (colorectal). More importantly, only one ChatGPT-generated question (colorectal) achieved an acceptable discrimination level, while other four failed to achieve it. In contrast, human-written questions showed acceptable discrimination levels.

CONCLUSION

While ChatGPT has the potential to generate multiple-choice questions comparable to human-written ones in specific contexts, the variability across surgical topics points to the need for human oversight and review before their use in exams. It is important to integrate artificial intelligence tools like ChatGPT with human expertise to enhance efficiency and quality.

摘要

背景

本研究旨在确定由ChatGPT生成的外科多项选择题是否与人类专家（外科医生）编写的题目相当。

方法

该研究在一所医学院进行，涉及112名四年级医学生。基于普通外科的五个学习目标（结直肠、胃、创伤、乳腺、甲状腺），ChatGPT和外科医生分别生成了五道多项选择题。ChatGPT生成的题目未作修改。报告了这些题目的统计特性，包括两组题目之间的相关性以及与普通外科实习考试总成绩的相关性（题目区分度）。

结果

对于一个学习目标（结直肠），ChatGPT生成的题目与人类编写的题目之间存在显著正相关。更重要的是，ChatGPT生成的题目中只有一道（结直肠）达到了可接受的区分度水平，而其他四道未达到。相比之下，人类编写的题目显示出可接受的区分度水平。

结论

虽然ChatGPT有潜力在特定背景下生成与人类编写的题目相当的多项选择题，但外科主题之间的差异表明在将其用于考试之前需要人工监督和审查。将ChatGPT等人工智能工具与人类专业知识相结合以提高效率和质量很重要。

相似文献

Can ChatGPT generate surgical multiple-choice questions comparable to those written by a surgeon?ChatGPT能否生成与外科医生编写的相媲美的外科选择题？

Proc (Bayl Univ Med Cent). 2024 Oct 22;38(1):48-52. doi: 10.1080/08998280.2024.2418752. eCollection 2025.

Can ChatGPT Generate Acceptable Case-Based Multiple-Choice Questions for Medical School Anatomy Exams? A Pilot Study on Item Difficulty and Discrimination.ChatGPT能否生成适用于医学院解剖学考试的基于案例的多项选择题？关于题目难度和区分度的初步研究。

Clin Anat. 2025 May;38(4):505-510. doi: 10.1002/ca.24271. Epub 2025 Mar 24.

AI in radiography education: Evaluating multiple-choice questions difficulty and discrimination.放射学教育中的人工智能：评估多项选择题的难度和区分度。

J Med Imaging Radiat Sci. 2025 Mar 28;56(4):101896. doi: 10.1016/j.jmir.2025.101896.

ChatGPT for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam.ChatGPT 生成选择题：人工智能在合理药物治疗考试自动试题生成中的应用证据。

Eur J Clin Pharmacol. 2024 May;80(5):729-735. doi: 10.1007/s00228-024-03649-x. Epub 2024 Feb 14.

Probing artificial intelligence in neurosurgical training: ChatGPT takes a neurosurgical residents written exam.探索人工智能在神经外科培训中的应用：ChatGPT参加神经外科住院医师笔试。

Brain Spine. 2023 Nov 29;4:102715. doi: 10.1016/j.bas.2023.102715. eCollection 2024.

Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.评估问题特征对 ChatGPT 表现和回应解释一致性的影响：来自台湾护理执照考试的见解。

Int J Nurs Stud. 2024 May;153:104717. doi: 10.1016/j.ijnurstu.2024.104717. Epub 2024 Feb 8.

ChatGPT to generate clinical vignettes for teaching and multiple-choice questions for assessment: A randomized controlled experiment.ChatGPT用于生成教学用临床案例和评估用多项选择题：一项随机对照实验。

Med Teach. 2025 Feb;47(2):268-274. doi: 10.1080/0142159X.2024.2327477. Epub 2024 Mar 13.

Large Language Models in Medical Education: Comparing ChatGPT- to Human-Generated Exam Questions.大语言模型在医学教育中的应用：比较 ChatGPT 与人工生成的考试题目。

Acad Med. 2024 May 1;99(5):508-512. doi: 10.1097/ACM.0000000000005626. Epub 2023 Dec 28.

Comparison of Gemini Advanced and ChatGPT 4.0's Performances on the Ophthalmology Resident Ophthalmic Knowledge Assessment Program (OKAP) Examination Review Question Banks.Gemini Advanced与ChatGPT 4.0在眼科住院医师眼科知识评估计划（OKAP）考试复习题库中的表现比较。

Cureus. 2024 Sep 17;16(9):e69612. doi: 10.7759/cureus.69612. eCollection 2024 Sep.

Evaluating capabilities of large language models: Performance of GPT-4 on surgical knowledge assessments.评估大语言模型的能力：GPT-4在外科知识评估中的表现。

Surgery. 2024 Apr;175(4):936-942. doi: 10.1016/j.surg.2023.12.014. Epub 2024 Jan 20.

本文引用的文献

Large Language Models as Tools to Generate Radiology Board-Style Multiple-Choice Questions.大语言模型在生成放射科 Board 式多项选择题中的应用。

Acad Radiol. 2024 Sep;31(9):3872-3878. doi: 10.1016/j.acra.2024.06.046. Epub 2024 Jul 15.

ChatGPT prompts for generating multiple-choice questions in medical education and evidence on their validity: a literature review.ChatGPT 提示在医学教育中生成多项选择题及其有效性的证据：文献综述。

Postgrad Med J. 2024 Oct 18;100(1189):858-865. doi: 10.1093/postmj/qgae065.

How I GPT It: Development of Custom Artificial Intelligence (AI) Chatbots for Surgical Education.我是如何做到的：开发用于外科教育的定制人工智能 (AI) 聊天机器人。

J Surg Educ. 2024 Jun;81(6):772-775. doi: 10.1016/j.jsurg.2024.03.004. Epub 2024 Apr 16.

Med Teach. 2025 Feb;47(2):268-274. doi: 10.1080/0142159X.2024.2327477. Epub 2024 Mar 13.

Evaluation of ChatGPT-generated medical responses: A systematic review and meta-analysis.评价 ChatGPT 生成的医学回复：系统评价和荟萃分析。

J Biomed Inform. 2024 Mar;151:104620. doi: 10.1016/j.jbi.2024.104620. Epub 2024 Mar 8.

Eur J Clin Pharmacol. 2024 May;80(5):729-735. doi: 10.1007/s00228-024-03649-x. Epub 2024 Feb 14.

Case-based MCQ generator: A custom ChatGPT based on published prompts in the literature for automatic item generation.基于病例的多项选择题生成器：一种自定义的 ChatGPT，基于文献中发布的提示进行自动项目生成。

Med Teach. 2024 Aug;46(8):1018-1020. doi: 10.1080/0142159X.2024.2314723. Epub 2024 Feb 10.

Response to: "ChatGPT for assessment writing".对《用于评估写作的ChatGPT》的回应

Med Teach. 2024 Jun;46(6):857-858. doi: 10.1080/0142159X.2024.2311269. Epub 2024 Feb 2.

Twelve tips on creating and using custom GPTs to enhance health professions education.关于创建和使用定制 GPT 以增强健康职业教育的 12 点建议。

Med Teach. 2024 Jun;46(6):752-756. doi: 10.1080/0142159X.2024.2305365. Epub 2024 Jan 29.

Large Language Models in Medical Education: Comparing ChatGPT- to Human-Generated Exam Questions.大语言模型在医学教育中的应用：比较 ChatGPT 与人工生成的考试题目。

Acad Med. 2024 May 1;99(5):508-512. doi: 10.1097/ACM.0000000000005626. Epub 2023 Dec 28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验