确保对小群体进行的多项选择题考试的质量：一个警示故事。

Ensuring the quality of multiple-choice exams administered to small cohorts: A cautionary tale.

作者信息

Young Meredith, Cummings Beth-Ann, St-Onge Christina

机构信息

Department of Medicine, McGill University, Montreal, Quebec, Canada.

Centre for Medical Education, McGill University, Montreal, Quebec, Canada.

出版信息

Perspect Med Educ. 2017 Feb;6(1):21-28. doi: 10.1007/s40037-016-0322-0.

DOI:10.1007/s40037-016-0322-0

PMID:28050882

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5285282/

Abstract

INTRODUCTION

Multiple-choice questions (MCQs) are a cornerstone of assessment in medical education. Monitoring item properties (difficulty and discrimination) are important means of investigating examination quality. However, most item property guidelines were developed for use on large cohorts of examinees; little empirical work has investigated the suitability of applying guidelines to item difficulty and discrimination coefficients estimated for small cohorts, such as those in medical education. We investigated the extent to which item properties vary across multiple clerkship cohorts to better understand the appropriateness of using such guidelines with small cohorts.

METHODS

Exam results for 32 items from an MCQ exam were used. Item discrimination and difficulty coefficients were calculated for 22 cohorts (n = 10-15 students). Discrimination coefficients were categorized according to Ebel and Frisbie (1991). Difficulty coefficients were categorized according to three guidelines by Laveault and Grégoire (2014). Descriptive analyses examined variance in item properties across cohorts.

RESULTS

A large amount of variance in item properties was found across cohorts. Discrimination coefficients for items varied greatly across cohorts, with 29/32 (91%) of items occurring in both Ebel and Frisbie's 'poor' and 'excellent' categories and 19/32 (59%) of items occurring in all five categories. For item difficulty coefficients, the application of different guidelines resulted in large variations in examination length (number of items removed ranged from 0 to 22).

DISCUSSION

While the psychometric properties of items can provide information on item and exam quality, they vary greatly in small cohorts. The application of guidelines with small exam cohorts should be approached with caution.

摘要

引言

多项选择题是医学教育评估的基石。监测题目属性（难度和区分度）是调查考试质量的重要手段。然而，大多数题目属性指南是为大量考生群体设计的；很少有实证研究探讨将这些指南应用于小群体（如医学教育中的群体）估计的题目难度和区分度系数的适用性。我们调查了题目属性在多个临床实习群体中的差异程度，以更好地理解对小群体使用此类指南的适当性。

方法

使用了一份多项选择题考试中32道题目的考试结果。计算了22个群体（n = 10 - 15名学生）的题目区分度和难度系数。区分度系数根据埃贝尔和弗里斯比（1991年）进行分类。难度系数根据拉韦奥和格雷瓜尔（2014年）的三项指南进行分类。描述性分析检查了各群体间题目属性的差异。

结果

各群体间题目属性存在大量差异。题目区分度系数在各群体间差异很大，32道题中有29道（91%）出现在埃贝尔和弗里斯比的“差”和“优”类别中，32道题中有19道（59%）出现在所有五个类别中。对于题目难度系数，不同指南的应用导致考试长度差异很大（删除的题目数量从0到22不等）。

讨论

虽然题目的心理测量属性可以提供有关题目和考试质量的信息，但在小群体中它们差异很大。对小考试群体应用指南时应谨慎。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9687/5285282/3ac532bc37df/40037_2016_322_Fig1_HTML.jpg

相似文献

Ensuring the quality of multiple-choice exams administered to small cohorts: A cautionary tale.确保对小群体进行的多项选择题考试的质量：一个警示故事。

Perspect Med Educ. 2017 Feb;6(1):21-28. doi: 10.1007/s40037-016-0322-0.

Examinee Cohort Size and Item Analysis Guidelines for Health Professions Education Programs: A Monte Carlo Simulation Study.考生队列大小和项目分析指南健康专业教育计划：一项蒙特卡罗模拟研究。

Acad Med. 2020 Jan;95(1):151-156. doi: 10.1097/ACM.0000000000002888.

Using Automatic Item Generation to Improve the Quality of MCQ Distractors.使用自动试题生成来提高多项选择题干扰项的质量。

Teach Learn Med. 2016;28(2):166-73. doi: 10.1080/10401334.2016.1146608.

Item Analysis of Multiple-Choice Question (MCQ)-Based Exam Efficiency Among Postgraduate Pediatric Medical Students: An Observational, Cross-Sectional Study From Saudi Arabia.沙特阿拉伯研究生儿科医学生基于多项选择题考试效率的项目分析：一项观察性横断面研究

Cureus. 2024 Sep 11;16(9):e69151. doi: 10.7759/cureus.69151. eCollection 2024 Sep.

Quality of multiple-choice questions in medical internship qualification examination determined by item response theory at Debre Tabor University, Ethiopia.埃塞俄比亚德布雷塔博尔大学运用项目反应理论确定医学实习资格考试多项选择题的质量。

BMC Med Educ. 2022 Aug 22;22(1):635. doi: 10.1186/s12909-022-03687-y.

Is a picture worth a thousand words: an analysis of the difficulty and discrimination parameters of illustrated vs. text-alone vignettes in histology multiple choice questions.一幅图胜过千言万语：组织学选择题中带插图与纯文本小病例的难度及区分度参数分析

BMC Med Educ. 2015 Oct 26;15:184. doi: 10.1186/s12909-015-0452-9.

Item analysis: the impact of distractor efficiency on the difficulty index and discrimination power of multiple-choice items.项目分析：干扰项效率对多项选择题难度指数和区分度的影响。

BMC Med Educ. 2024 Apr 24;24(1):445. doi: 10.1186/s12909-024-05433-y.

Assessment of the Quality of Multiple-Choice Questions in the Surgery Course for an Integrated Curriculum, University of Bisha College of Medicine, Saudi Arabia.沙特阿拉伯比沙大学医学院综合课程外科学课程中多项选择题质量评估

Cureus. 2023 Dec 13;15(12):e50441. doi: 10.7759/cureus.50441. eCollection 2023 Dec.

Do item-writing flaws reduce examinations psychometric quality?试题编写缺陷会降低考试的心理测量学质量吗？

BMC Res Notes. 2016 Aug 11;9(1):399. doi: 10.1186/s13104-016-2202-4.

The impact of item-writing flaws and item complexity on examination item difficulty and discrimination value.试题编写缺陷和试题复杂度对考试试题难度及区分度的影响。

BMC Med Educ. 2016 Sep 29;16(1):250. doi: 10.1186/s12909-016-0773-3.

引用本文的文献

Evaluation of the quality of multiple-choice questions according to the students' academic level.根据学生的学业水平评估多项选择题的质量。

BMC Med Educ. 2022 Nov 11;22(1):779. doi: 10.1186/s12909-022-03844-3.

BMC Med Educ. 2022 Aug 22;22(1):635. doi: 10.1186/s12909-022-03687-y.

Long-term effects of an e-learning course on patient safety: A controlled longitudinal study with medical students.电子学习课程对患者安全的长期影响：一项对医学生进行的对照性纵向研究。

PLoS One. 2019 Jan 18;14(1):e0210947. doi: 10.1371/journal.pone.0210947. eCollection 2019.

Addressing the theory-practice gap in assessment.解决评估中的理论与实践差距问题。

Perspect Med Educ. 2017 Feb;6(1):7-9. doi: 10.1007/s40037-016-0323-z.

本文引用的文献

The Power of Testing Memory: Basic Research and Implications for Educational Practice.测试记忆的力量：基础研究及其对教育实践的启示。

Perspect Psychol Sci. 2006 Sep;1(3):181-210. doi: 10.1111/j.1745-6916.2006.00012.x.

Repeated testing improves long-term retention relative to repeated study: a randomised controlled trial.重复测试相对于重复学习能提高长期保留率：一项随机对照试验。

Med Educ. 2009 Dec;43(12):1174-81. doi: 10.1111/j.1365-2923.2009.03518.x.

Test-enhanced learning in medical education.医学教育中的测试强化学习。

Med Educ. 2008 Oct;42(10):959-66. doi: 10.1111/j.1365-2923.2008.03124.x.

Physician scores on a national clinical skills examination as predictors of complaints to medical regulatory authorities.医生在全国临床技能考试中的成绩作为向医疗监管机构投诉的预测指标。

JAMA. 2007 Sep 5;298(9):993-1001. doi: 10.1001/jama.298.9.993.

Assessment in medical education.医学教育中的评估。

N Engl J Med. 2007 Jan 25;356(4):387-96. doi: 10.1056/NEJMra054784.

Use of a committee review process to improve the quality of course examinations.利用委员会审查程序提高课程考试质量。

Adv Health Sci Educ Theory Pract. 2006 Feb;11(1):61-8. doi: 10.1007/s10459-004-7515-8.

Test-enhanced learning: taking memory tests improves long-term retention.测试强化学习：进行记忆测试可提高长期记忆保持能力。

Psychol Sci. 2006 Mar;17(3):249-55. doi: 10.1111/j.1467-9280.2006.01693.x.

Association between licensure examination scores and practice in primary care.执照考试成绩与初级保健实践之间的关联。

JAMA. 2002 Dec 18;288(23):3019-26. doi: 10.1001/jama.288.23.3019.

Standardized or real patients to test clinical competence? The long case revisited.标准化患者还是真实患者来测试临床能力？再探长病例。

Med Educ. 2001 Apr;35(4):321-5. doi: 10.1046/j.1365-2923.2001.00928.x.

Association between licensing examination scores and resource use and quality of care in primary care practice.初级保健实践中执照考试成绩与资源利用及医疗质量之间的关联。

JAMA. 1998 Sep 16;280(11):989-96. doi: 10.1001/jama.280.11.989.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

确保对小群体进行的多项选择题考试的质量：一个警示故事。

Ensuring the quality of multiple-choice exams administered to small cohorts: A cautionary tale.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

DISCUSSION

引言

方法

结果

讨论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献