ChatGPT 生成选择题：人工智能在合理药物治疗考试自动试题生成中的应用证据。

ChatGPT for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam.

机构信息

Department of Medical Education and Informatics, Faculty of Medicine, Gazi University, Ankara, Turkey.

Gazi Üniversitesi Hastanesi E Blok 9, Kat 06500 Beşevler, Ankara, Turkey.

出版信息

Eur J Clin Pharmacol. 2024 May;80(5):729-735. doi: 10.1007/s00228-024-03649-x. Epub 2024 Feb 14.

DOI:10.1007/s00228-024-03649-x

PMID:38353690

Abstract

PURPOSE

Artificial intelligence, specifically large language models such as ChatGPT, offers valuable potential benefits in question (item) writing. This study aimed to determine the feasibility of generating case-based multiple-choice questions using ChatGPT in terms of item difficulty and discrimination levels.

METHODS

This study involved 99 fourth-year medical students who participated in a rational pharmacotherapy clerkship carried out based-on the WHO 6-Step Model. In response to a prompt that we provided, ChatGPT generated ten case-based multiple-choice questions on hypertension. Following an expert panel, two of these multiple-choice questions were incorporated into a medical school exam without making any changes in the questions. Based on the administration of the test, we evaluated their psychometric properties, including item difficulty, item discrimination (point-biserial correlation), and functionality of the options.

RESULTS

Both questions exhibited acceptable levels of point-biserial correlation, which is higher than the threshold of 0.30 (0.41 and 0.39). However, one question had three non-functional options (options chosen by fewer than 5% of the exam participants) while the other question had none.

CONCLUSIONS

The findings showed that the questions can effectively differentiate between students who perform at high and low levels, which also point out the potential of ChatGPT as an artificial intelligence tool in test development. Future studies may use the prompt to generate items in order for enhancing the external validity of the results by gathering data from diverse institutions and settings.

摘要

目的

人工智能，特别是大型语言模型（如 ChatGPT），在问题（项目）写作方面具有有价值的潜在优势。本研究旨在确定使用 ChatGPT 生成基于案例的多项选择题在项目难度和区分度方面的可行性。

方法

本研究涉及 99 名四年级医学生，他们参加了基于世界卫生组织 6 步模型的合理药物治疗实习。ChatGPT 根据我们提供的提示生成了十个基于案例的高血压多项选择题。在专家小组的指导下，其中两个多项选择题被纳入医学院考试，而问题本身没有任何改动。根据测试的实施情况，我们评估了它们的心理测量学特性，包括项目难度、项目区分度（点二项相关）和选项的功能。

结果

两个问题的点二项相关都达到了可接受的水平，高于 0.30 的阈值（分别为 0.41 和 0.39）。然而，一个问题有三个非功能选项（选择这些选项的考生不到 5%），而另一个问题则没有。

结论

研究结果表明，这些问题可以有效区分高水平和低水平的学生，也指出了 ChatGPT 作为人工智能工具在测试开发中的潜力。未来的研究可以使用提示来生成项目，以便通过从不同机构和环境中收集数据来提高结果的外部有效性。

相似文献

ChatGPT for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam.

Eur J Clin Pharmacol. 2024 May;80(5):729-735. doi: 10.1007/s00228-024-03649-x. Epub 2024 Feb 14.

ChatGPT to generate clinical vignettes for teaching and multiple-choice questions for assessment: A randomized controlled experiment.

Med Teach. 2025 Feb;47(2):268-274. doi: 10.1080/0142159X.2024.2327477. Epub 2024 Mar 13.

Assessing question characteristic influences on ChatGPT's performance and response-explanation consistency: Insights from Taiwan's Nursing Licensing Exam.

Int J Nurs Stud. 2024 May;153:104717. doi: 10.1016/j.ijnurstu.2024.104717. Epub 2024 Feb 8.

ChatGPT prompts for generating multiple-choice questions in medical education and evidence on their validity: a literature review.

Postgrad Med J. 2024 Oct 18;100(1189):858-865. doi: 10.1093/postmj/qgae065.

ChatGPT Generated Otorhinolaryngology Multiple-Choice Questions: Quality, Psychometric Properties, and Suitability for Assessments.

OTO Open. 2024 Sep 26;8(3):e70018. doi: 10.1002/oto2.70018. eCollection 2024 Jul-Sep.

ChatGPT in medical school: how successful is AI in progress testing?

Med Educ Online. 2023 Dec;28(1):2220920. doi: 10.1080/10872981.2023.2220920.

Probing artificial intelligence in neurosurgical training: ChatGPT takes a neurosurgical residents written exam.

Brain Spine. 2023 Nov 29;4:102715. doi: 10.1016/j.bas.2023.102715. eCollection 2024.

Large Language Models in Medical Education: Comparing ChatGPT- to Human-Generated Exam Questions.

Acad Med. 2024 May 1;99(5):508-512. doi: 10.1097/ACM.0000000000005626. Epub 2023 Dec 28.

Effectiveness of e-Learning in a Medical School 2.0 Model: Comparison of Item Analysis for Student-Generated vs. Faculty-Generated Multiple-Choice Questions.

Stud Health Technol Inform. 2019;257:184-188.

How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.

JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.

引用本文的文献

Comparison of AI-generated and clinician-designed multiple-choice questions in emergency medicine exam: a psychometric analysis.

BMC Med Educ. 2025 Jul 1;25(1):949. doi: 10.1186/s12909-025-07528-6.

Chatbots' Role in Generating Single Best Answer Questions for Undergraduate Medical Student Assessment: Comparative Analysis.

JMIR Med Educ. 2025 May 30;11:e69521. doi: 10.2196/69521.

Generative AI vs. human expertise: a comparative analysis of case-based rational pharmacotherapy question generation.

Eur J Clin Pharmacol. 2025 Jun;81(6):875-883. doi: 10.1007/s00228-025-03838-2. Epub 2025 Apr 9.

Using a Hybrid of AI and Template-Based Method in Automatic Item Generation to Create Multiple-Choice Questions in Medical Education: Hybrid AIG.

JMIR Form Res. 2025 Apr 4;9:e65726. doi: 10.2196/65726.

Challenging the curve: can ChatGPT-generated MCQs reduce grade inflation in pharmacy education.

Front Pharmacol. 2025 Jan 29;16:1516381. doi: 10.3389/fphar.2025.1516381. eCollection 2025.

AI versus human-generated multiple-choice questions for medical education: a cohort study in a high-stakes examination.

BMC Med Educ. 2025 Feb 8;25(1):208. doi: 10.1186/s12909-025-06796-6.

Transforming dental diagnostics with artificial intelligence: advanced integration of ChatGPT and large language models for patient care.

Front Dent Med. 2025 Jan 6;5:1456208. doi: 10.3389/fdmed.2024.1456208. eCollection 2024.

Beginner-Level Tips for Medical Educators: Guidance on Selection, Prompt Engineering, and the Use of Artificial Intelligence Chatbots.

Med Sci Educ. 2024 Aug 17;34(6):1571-1576. doi: 10.1007/s40670-024-02146-1. eCollection 2024 Dec.

Can ChatGPT generate surgical multiple-choice questions comparable to those written by a surgeon?

Proc (Bayl Univ Med Cent). 2024 Oct 22;38(1):48-52. doi: 10.1080/08998280.2024.2418752. eCollection 2025.

Automatic distractor generation in multiple-choice questions: a systematic literature review.

PeerJ Comput Sci. 2024 Nov 13;10:e2441. doi: 10.7717/peerj-cs.2441. eCollection 2024.

本文引用的文献

Sickle-Cell Trait as a Risk Factor for an Unprovoked Venous Thromboembolism: A Case Report.

Cureus. 2023 Dec 26;15(12):e51142. doi: 10.7759/cureus.51142. eCollection 2023 Dec.

ChatGPT for assessment writing.

Med Teach. 2023 Nov;45(11):1224-1227. doi: 10.1080/0142159X.2023.2249239. Epub 2023 Oct 16.

Automated Item Generation: impact of item variants on performance and standard setting.

BMC Med Educ. 2023 Sep 11;23(1):659. doi: 10.1186/s12909-023-04457-0.

ChatGPT Performs on the Chinese National Medical Licensing Examination.

J Med Syst. 2023 Aug 15;47(1):86. doi: 10.1007/s10916-023-01961-0.

Sailing the Seven Seas: A Multinational Comparison of ChatGPT's Performance on Medical Licensing Examinations.

Ann Biomed Eng. 2024 Jun;52(6):1542-1545. doi: 10.1007/s10439-023-03338-3. Epub 2023 Aug 8.

Ethical use of Artificial Intelligence in Health Professions Education: AMEE Guide No. 158.

Med Teach. 2023 Jun;45(6):574-584. doi: 10.1080/0142159X.2023.2186203. Epub 2023 Mar 13.

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.

PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.

How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.

JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.

Notes From the Field: Automatic Item Generation, Standard Setting, and Learner Performance in Mastery Multiple-Choice Tests.

Eval Health Prof. 2021 Sep;44(3):315-318. doi: 10.1177/0163278720908914. Epub 2020 Mar 4.

WHO guide to good prescribing is 25 years old: quo vadis?

Eur J Clin Pharmacol. 2020 Apr;76(4):507-513. doi: 10.1007/s00228-019-02823-w. Epub 2020 Jan 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

ChatGPT 生成选择题：人工智能在合理药物治疗考试自动试题生成中的应用证据。

ChatGPT for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam.

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献