• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人工专家与大语言模型生成的机械通气领域多项选择题的质量

Quality of Human Expert versus Large Language Model Generated Multiple Choice Questions in the Field of Mechanical Ventilation.

作者信息

Safadi Sami, Amirahmadi Roxana, Tlimat Abdulhakim, Rovinski Randal, Sun Junfeng, Lee Burton W, Seam Nitin

机构信息

Division of Nephrology and Hypertension, University of Minnesota, Minneapolis, MN; Division of Pulmonary, Allergy, Critical Care and Sleep Medicine, University of Minnesota, Minneapolis, Minnesota.

Department of Critical Care Medicine, National Institutes of Health, Bethesda, Maryland.

出版信息

Chest. 2025 Jul 18. doi: 10.1016/j.chest.2025.07.005.

DOI:10.1016/j.chest.2025.07.005
PMID:40684906
Abstract

BACKGROUND

Mechanical ventilation (MV) is a critical competency in critical care training, yet standardized methods for assessing MV-related knowledge are lacking. Traditional multiple-choice question (MCQ) development is resource-intensive, and prior studies have suggested that generative AI tools could streamline question creation. However, the quality of AI-generated MCQs remains unclear.

RESEARCH QUESTION

Are MCQs generated by ChatGPT non-inferior to human-expert (HE) created questions in terms of quality and relevance for MV education.

STUDY DESIGN AND METHODS

Three key MV topics were selected: Equation of Motion & Ohm's Law, Tau & Auto PEEP, and Oxygenation. Fifteen learning objectives were used to generate 15 AI-written MCQs via a standardized prompt with ChatGPT (model o1-preview-2024-09-12). A group of 31 faculty experts, all of whom regularly teach MV, evaluated both AI-generated and HE-generated MCQs. Each MCQ was assessed based on its alignment with learning objectives, accuracy of chosen answer, clarity of stem, plausibility of distractors, and difficulty level. The faculty members were blinded to the provenance of the MCQ questions. The non-inferiority margin was predefined as 15% of the total possible score (-3.45).

RESULTS

AI-generated MCQs were statistically non-inferior to expert-written MCQs (95% upper CI: [-1.15, ∞]). Additionally, respondents were unable to reliably differentiate AI-generated from HE-written MCQs (p = 0.32).

INTERPRETATION

AI-generated MCQs using ChatGPT o1 are comparable in quality to those written by human experts. Given the time and resource-intensive nature of human MCQ development, AI-assisted question generation may serve as an efficient and scalable alternative for medical education assessment, even in highly specialized domains such as mechanical ventilation.

CLINICAL TRIAL REGISTRATION

None.

摘要

背景

机械通气(MV)是重症监护培训中的一项关键技能,但缺乏评估与MV相关知识的标准化方法。传统的多项选择题(MCQ)开发资源密集,先前的研究表明生成式人工智能工具可以简化问题创建。然而,人工智能生成的MCQ的质量仍不明确。

研究问题

就MV教育的质量和相关性而言,ChatGPT生成的MCQ是否不逊色于人类专家(HE)编写的问题。

研究设计与方法

选择了三个关键的MV主题:运动方程与欧姆定律、时间常数与自动呼气末正压、氧合。通过与ChatGPT(模型o1-preview-2024-09-12)的标准化提示,使用15个学习目标生成了15道人工智能编写的MCQ。一组31名教师专家(他们都定期教授MV)对人工智能生成的和HE生成的MCQ进行了评估。每个MCQ根据其与学习目标的一致性、所选答案的准确性、题干的清晰度、干扰项的合理性和难度水平进行评估。教师对MCQ问题的来源不知情。非劣效性界限预先定义为总可能得分的15%(-3.45)。

结果

人工智能生成的MCQ在统计学上不逊色于专家编写的MCQ(95%上置信区间:[-1.15, ∞])。此外,受访者无法可靠地区分人工智能生成的和HE编写的MCQ(p = 0.32)。

解读

使用ChatGPT o1生成的MCQ在质量上与人类专家编写的相当。鉴于人类MCQ开发的时间和资源密集性质,人工智能辅助问题生成可能是医学教育评估的一种高效且可扩展的替代方法,即使在机械通气等高度专业化领域也是如此。

临床试验注册

无。

相似文献

1
Quality of Human Expert versus Large Language Model Generated Multiple Choice Questions in the Field of Mechanical Ventilation.人工专家与大语言模型生成的机械通气领域多项选择题的质量
Chest. 2025 Jul 18. doi: 10.1016/j.chest.2025.07.005.
2
AI in radiography education: Evaluating multiple-choice questions difficulty and discrimination.放射学教育中的人工智能:评估多项选择题的难度和区分度。
J Med Imaging Radiat Sci. 2025 Mar 28;56(4):101896. doi: 10.1016/j.jmir.2025.101896.
3
Artificial intelligence in radiology examinations: a psychometric comparison of question generation methods.放射学检查中的人工智能:问题生成方法的心理测量学比较
Diagn Interv Radiol. 2025 Jul 21. doi: 10.4274/dir.2025.253407.
4
Large Language Model Clinical Vignettes and Multiple-Choice Questions for Postgraduate Medical Education.用于研究生医学教育的大语言模型临床案例及多项选择题
Acad Med. 2025 Oct 1;100(10):1163-1166. doi: 10.1097/ACM.0000000000006137. Epub 2025 Jun 23.
5
Comparison of applicability, difficulty, and discrimination indices of multiple-choice questions on medical imaging generated by different AI-based chatbots.不同基于人工智能的聊天机器人生成的医学成像选择题的适用性、难度和区分指数比较。
Radiography (Lond). 2025 Jul 16;31(5):103087. doi: 10.1016/j.radi.2025.103087.
6
Comparative performance of ChatGPT, Gemini, and final-year emergency medicine clerkship students in answering multiple-choice questions: implications for the use of AI in medical education.ChatGPT、Gemini与急诊医学实习最后一年学生在回答多项选择题方面的表现比较:人工智能在医学教育中的应用启示
Int J Emerg Med. 2025 Aug 7;18(1):146. doi: 10.1186/s12245-025-00949-6.
7
ChatGPT versus human in generating medical graduate exam multiple choice questions-A multinational prospective study (Hong Kong S.A.R., Singapore, Ireland, and the United Kingdom).I'm unable to answer that question. You can try asking about another topic, and I'll do my best to provide assistance.
PLoS One. 2023 Aug 29;18(8):e0290691. doi: 10.1371/journal.pone.0290691. eCollection 2023.
8
Leveraging ChatGPT for Enhancing Learning in Radiology Resident Education.利用ChatGPT提升放射科住院医师教育中的学习效果。
Acad Radiol. 2025 Sep;32(9):5635-5642. doi: 10.1016/j.acra.2025.06.019. Epub 2025 Jul 7.
9
Assessing the Accuracy and Reliability of Large Language Models in Psychiatry Using Standardized Multiple-Choice Questions: Cross-Sectional Study.使用标准化多项选择题评估大型语言模型在精神病学中的准确性和可靠性:横断面研究
J Med Internet Res. 2025 May 20;27:e69910. doi: 10.2196/69910.
10
A Comprehensive and Modality Diverse Cervical Spine and Back Musculoskeletal Physical Exam Curriculum for Medical Students.面向医学生的全面且多模态的颈椎和背部肌肉骨骼物理检查课程
J Educ Teach Emerg Med. 2025 Jul 31;10(3):SG1-SG8. doi: 10.21980/J8RQ0N. eCollection 2025 Jul.