• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于研究生医学教育的大语言模型临床案例及多项选择题

Large Language Model Clinical Vignettes and Multiple-Choice Questions for Postgraduate Medical Education.

作者信息

Jackson Frank I, Keller Nathan A, Kouba Insaf, Kouba Wassil, Bracero Luis A, Blitz Matthew J

出版信息

Acad Med. 2025 Oct 1;100(10):1163-1166. doi: 10.1097/ACM.0000000000006137. Epub 2025 Jun 23.

DOI:10.1097/ACM.0000000000006137
PMID:40550116
Abstract

PROBLEM

Clinical vignette-based multiple-choice questions (MCQs) have been used to assess postgraduate medical trainees but require substantial time and effort to develop. Large language models, a type of artificial intelligence (AI), can potentially expedite this task. This report describes prompt engineering techniques used with ChatGPT-4 to generate clinical vignettes and MCQs for obstetrics-gynecology residents and evaluates whether residents and attending physicians can differentiate between human- and AI-generated content.

APPROACH

The authors generated MCQs using a structured prompt engineering approach, incorporating authoritative source documents and an iterative prompt chaining technique, to refine output quality. Fifty human-generated and 50 AI-generated MCQs were randomly arranged into 10 quizzes (10 questions each). The AI-generated MCQs were developed in August 2024 and surveys conducted in September 2024. Obstetrics-gynecology residents and attending physician faculty members at Northwell Health or Donald and Barbara Zucker School of Medicine at Hofstra/Northwell completed an online survey, answering each MCQ and indicating whether they believed it was human or AI written or if they were uncertain.

OUTCOMES

Thirty-three participants (16 residents, 17 attendings) completed the survey (80.5% response rate). Respondents correctly identified MCQ authorship a median (interquartile range [IQR]) of 39.1% (30.0%-50.0%) of the time, indicating difficulty in distinguishing human- and AI-generated questions. The median (IQR) correct answer selection rate was 62.3% (50.0%-75.0%) for human-generated MCQs and 64.4% (50.0%-83.3%) for AI-generated MCQs ( P = .74). The difficulty (0.69 vs 0.66, P = .83) and discriminatory (0.42 vs 0.38, P = .90) indexes showed no significant differences, supporting the feasibility of large language model-generated MCQs in medical education.

NEXT STEPS

Future studies should explore the optimal balance between AI-generated content and expert review, identifying strategies to maximize efficiency without compromising accuracy. The authors will develop practice exams and assess their predictive validity by comparing scores with standardized exam results.

摘要

问题

基于临床病例的多项选择题(MCQs)已被用于评估医学研究生,但开发此类题目需要大量时间和精力。大型语言模型作为一种人工智能(AI),有可能加快这项任务的进程。本报告描述了与ChatGPT-4一起使用的提示工程技术,以生成针对妇产科住院医师的临床病例和多项选择题,并评估住院医师和主治医师是否能够区分人工生成和AI生成的内容。

方法

作者采用结构化提示工程方法生成多项选择题,纳入权威源文档并运用迭代提示链技术,以提高输出质量。将50道人工生成的和50道AI生成的多项选择题随机排列成10组测验(每组10道题)。AI生成的多项选择题于2024年8月开发,并于2024年9月进行调查。诺斯韦尔健康中心或霍夫斯特拉/诺斯韦尔唐纳德和芭芭拉·扎克医学院的妇产科住院医师和主治医师教员完成了一项在线调查,回答每道多项选择题,并指出他们认为该题是人工编写还是AI编写,或者他们不确定。

结果

33名参与者(16名住院医师,17名主治医师)完成了调查(回复率80.5%)。受访者正确识别多项选择题作者身份的中位数(四分位间距[IQR])为39.1%(30.0%-50.0%),这表明区分人工生成和AI生成的问题存在困难。人工生成的多项选择题的正确答案选择率中位数(IQR)为62.3%(50.0%-75.0%),AI生成的多项选择题为64.4%(50.0%-83.3%)(P = 0.74)。难度指数(0.69对0.66,P = 0.83)和区分度指数(0.42对0.38,P = 0.90)均无显著差异,这支持了大型语言模型生成的多项选择题在医学教育中的可行性。

下一步

未来的研究应探索AI生成内容与专家评审之间的最佳平衡,确定在不影响准确性的前提下提高效率的策略。作者将开发练习考试,并通过将分数与标准化考试结果进行比较来评估其预测效度。

相似文献

1
Large Language Model Clinical Vignettes and Multiple-Choice Questions for Postgraduate Medical Education.用于研究生医学教育的大语言模型临床案例及多项选择题
Acad Med. 2025 Oct 1;100(10):1163-1166. doi: 10.1097/ACM.0000000000006137. Epub 2025 Jun 23.
2
AI in radiography education: Evaluating multiple-choice questions difficulty and discrimination.放射学教育中的人工智能:评估多项选择题的难度和区分度。
J Med Imaging Radiat Sci. 2025 Mar 28;56(4):101896. doi: 10.1016/j.jmir.2025.101896.
3
Leveraging ChatGPT for Enhancing Learning in Radiology Resident Education.利用ChatGPT提升放射科住院医师教育中的学习效果。
Acad Radiol. 2025 Sep;32(9):5635-5642. doi: 10.1016/j.acra.2025.06.019. Epub 2025 Jul 7.
4
Quality of Human Expert versus Large Language Model Generated Multiple Choice Questions in the Field of Mechanical Ventilation.人工专家与大语言模型生成的机械通气领域多项选择题的质量
Chest. 2025 Jul 18. doi: 10.1016/j.chest.2025.07.005.
5
Comparison of applicability, difficulty, and discrimination indices of multiple-choice questions on medical imaging generated by different AI-based chatbots.不同基于人工智能的聊天机器人生成的医学成像选择题的适用性、难度和区分指数比较。
Radiography (Lond). 2025 Jul 16;31(5):103087. doi: 10.1016/j.radi.2025.103087.
6
Performance of ChatGPT-4 Omni and Gemini 1.5 Pro on Ophthalmology-Related Questions in the Turkish Medical Specialty Exam.ChatGPT-4 Omni和Gemini 1.5 Pro在土耳其医学专业考试中与眼科相关问题上的表现。
Turk J Ophthalmol. 2025 Aug 21;55(4):177-185. doi: 10.4274/tjo.galenos.2025.27895.
7
Vesicoureteral Reflux膀胱输尿管反流
8
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
9
Examining the Role of Artificial Intelligence in Assessment: A Comparative Study of ChatGPT and Educator-Generated Multiple-Choice Questions in a Dental Exam.审视人工智能在评估中的作用:ChatGPT与教育工作者生成的牙科考试多项选择题的比较研究
Eur J Dent Educ. 2025 Aug 10. doi: 10.1111/eje.70034.
10
Artificial intelligence in radiology examinations: a psychometric comparison of question generation methods.放射学检查中的人工智能:问题生成方法的心理测量学比较
Diagn Interv Radiol. 2025 Jul 21. doi: 10.4274/dir.2025.253407.