• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估使用自动化流程创建的医学多项选择题的质量。

Evaluating the quality of medical multiple-choice items created with automated processes.

机构信息

Centre for Research in Applied Measurement and Evaluation, Faculty of Education, University of Alberta, Edmonton, Alberta, Canada.

出版信息

Med Educ. 2013 Jul;47(7):726-33. doi: 10.1111/medu.12202.

DOI:10.1111/medu.12202
PMID:23746162
Abstract

OBJECTIVES

Computerised assessment raises formidable challenges because it requires large numbers of test items. Automatic item generation (AIG) can help address this test development problem because it yields large numbers of new items both quickly and efficiently. To date, however, the quality of the items produced using a generative approach has not been evaluated. The purpose of this study was to determine whether automatic processes yield items that meet standards of quality that are appropriate for medical testing. Quality was evaluated firstly by subjecting items created using both AIG and traditional processes to rating by a four-member expert medical panel using indicators of multiple-choice item quality, and secondly by asking the panellists to identify which items were developed using AIG in a blind review.

METHODS

Fifteen items from the domain of therapeutics were created in three different experimental test development conditions. The first 15 items were created by content specialists using traditional test development methods (Group 1 Traditional). The second 15 items were created by the same content specialists using AIG methods (Group 1 AIG). The third 15 items were created by a new group of content specialists using traditional methods (Group 2 Traditional). These 45 items were then evaluated for quality by a four-member panel of medical experts and were subsequently categorised as either Traditional or AIG items.

RESULTS

Three outcomes were reported: (i) the items produced using traditional and AIG processes were comparable on seven of eight indicators of multiple-choice item quality; (ii) AIG items can be differentiated from Traditional items by the quality of their distractors, and (iii) the overall predictive accuracy of the four expert medical panellists was 42%.

CONCLUSIONS

Items generated by AIG methods are, for the most part, equivalent to traditionally developed items from the perspective of expert medical reviewers. While the AIG method produced comparatively fewer plausible distractors than the traditional method, medical experts cannot consistently distinguish AIG items from traditionally developed items in a blind review.

摘要

目的

计算机评估带来了巨大的挑战,因为它需要大量的测试项目。自动项目生成(AIG)可以帮助解决这个测试开发问题,因为它可以快速有效地生成大量新的项目。然而,到目前为止,使用生成方法生成的项目的质量尚未得到评估。本研究的目的是确定自动过程是否产生符合医学测试质量标准的项目。首先,通过让四名医学专家组成的小组使用多项选择题质量指标对使用 AIG 和传统方法创建的项目进行评分,评估项目的质量;其次,让小组成员在盲审中识别哪些项目是使用 AIG 开发的。

方法

从治疗领域创建了 15 个项目,这些项目是在三种不同的实验测试开发条件下创建的。前 15 个项目是由内容专家使用传统测试开发方法创建的(第 1 组传统方法)。后 15 个项目是由同一名内容专家使用 AIG 方法创建的(第 1 组 AIG)。第三组 15 个项目是由一组新的内容专家使用传统方法创建的(第 2 组传统方法)。然后,由四名医学专家组成的小组对这 45 个项目进行质量评估,并将其归类为传统项目或 AIG 项目。

结果

报告了三个结果:(i)使用传统和 AIG 过程生成的项目在八项多项选择题质量指标中的七个指标上相似;(ii)可以通过干扰项的质量将 AIG 项目与传统项目区分开来;(iii)四位医学专家小组成员的整体预测准确性为 42%。

结论

从专家医学审查员的角度来看,AIG 方法生成的项目在大多数方面与传统开发的项目相当。虽然 AIG 方法生成的干扰项比传统方法少,但医学专家无法在盲审中始终如一地将 AIG 项目与传统开发的项目区分开来。

相似文献

1
Evaluating the quality of medical multiple-choice items created with automated processes.评估使用自动化流程创建的医学多项选择题的质量。
Med Educ. 2013 Jul;47(7):726-33. doi: 10.1111/medu.12202.
2
Using automatic item generation to create multiple-choice test items.使用自动项目生成技术来创建多项选择题测试项目。
Med Educ. 2012 Aug;46(8):757-65. doi: 10.1111/j.1365-2923.2012.04289.x.
3
Using Automatic Item Generation to Improve the Quality of MCQ Distractors.使用自动试题生成来提高多项选择题干扰项的质量。
Teach Learn Med. 2016;28(2):166-73. doi: 10.1080/10401334.2016.1146608.
4
Three Modeling Applications to Promote Automatic Item Generation for Examinations in Dentistry.三种用于促进牙科考试自动试题生成的建模应用。
J Dent Educ. 2016 Mar;80(3):339-47.
5
Using cognitive models to develop quality multiple-choice questions.运用认知模型来编制高质量的多项选择题。
Med Teach. 2016 Aug;38(8):838-43. doi: 10.3109/0142159X.2016.1150989. Epub 2016 Mar 21.
6
A suggestive approach for assessing item quality, usability and validity of Automatic Item Generation.自动出题的项目质量、可用性和有效性评估的一种提示方法。
Adv Health Sci Educ Theory Pract. 2023 Dec;28(5):1441-1465. doi: 10.1007/s10459-023-10225-y. Epub 2023 Apr 25.
7
Automated Test-Item Generation System for Retrieval Practice in Radiology Education.放射科教育中检索练习用自动化试题生成系统。
Acad Radiol. 2019 Jun;26(6):851-859. doi: 10.1016/j.acra.2018.09.017. Epub 2018 Oct 10.
8
Using Automatic Item Generation to Create Multiple-Choice Questions for Pharmacy Assessment.使用自动项目生成技术为药剂学评估创建多项选择题。
Am J Pharm Educ. 2023 Oct;87(10):100081. doi: 10.1016/j.ajpe.2023.100081. Epub 2023 May 10.
9
Feasibility assurance: a review of automatic item generation in medical assessment.可行性保证:医学评估中自动项目生成的回顾。
Adv Health Sci Educ Theory Pract. 2022 May;27(2):405-425. doi: 10.1007/s10459-022-10092-z. Epub 2022 Mar 1.
10
Improved student learning in ophthalmology with computer-aided instruction.通过计算机辅助教学提高眼科学生的学习效果。
Eye (Lond). 2001 Oct;15(Pt 5):635-9. doi: 10.1038/eye.2001.199.

引用本文的文献

1
Ten tips to harnessing generative AI for high-quality MCQS in medical education assessment.在医学教育评估中利用生成式人工智能生成高质量多项选择题的十条建议。
Med Educ Online. 2025 Dec;30(1):2532682. doi: 10.1080/10872981.2025.2532682. Epub 2025 Jul 17.
2
Using a Hybrid of AI and Template-Based Method in Automatic Item Generation to Create Multiple-Choice Questions in Medical Education: Hybrid AIG.在医学教育中运用人工智能与基于模板的方法相结合的混合方式进行自动试题生成以创建多项选择题:混合式自动试题生成
JMIR Form Res. 2025 Apr 4;9:e65726. doi: 10.2196/65726.
3
Automated Item Generation: impact of item variants on performance and standard setting.
自动化项目生成:项目变体对表现和标准制定的影响。
BMC Med Educ. 2023 Sep 11;23(1):659. doi: 10.1186/s12909-023-04457-0.
4
A suggestive approach for assessing item quality, usability and validity of Automatic Item Generation.自动出题的项目质量、可用性和有效性评估的一种提示方法。
Adv Health Sci Educ Theory Pract. 2023 Dec;28(5):1441-1465. doi: 10.1007/s10459-023-10225-y. Epub 2023 Apr 25.
5
Essential steps in the development, implementation, evaluation and quality assurance of the written part of the Swiss federal licensing examination for human medicine.瑞士联邦人用药物许可考试书面部分的开发、实施、评估和质量保证的基本步骤。
GMS J Med Educ. 2022 Sep 15;39(4):Doc43. doi: 10.3205/zma001564. eCollection 2022.
6
Feasibility assurance: a review of automatic item generation in medical assessment.可行性保证:医学评估中自动项目生成的回顾。
Adv Health Sci Educ Theory Pract. 2022 May;27(2):405-425. doi: 10.1007/s10459-022-10092-z. Epub 2022 Mar 1.