Suppr超能文献

确保对小群体进行的多项选择题考试的质量:一个警示故事。

Ensuring the quality of multiple-choice exams administered to small cohorts: A cautionary tale.

作者信息

Young Meredith, Cummings Beth-Ann, St-Onge Christina

机构信息

Department of Medicine, McGill University, Montreal, Quebec, Canada.

Centre for Medical Education, McGill University, Montreal, Quebec, Canada.

出版信息

Perspect Med Educ. 2017 Feb;6(1):21-28. doi: 10.1007/s40037-016-0322-0.

Abstract

INTRODUCTION

Multiple-choice questions (MCQs) are a cornerstone of assessment in medical education. Monitoring item properties (difficulty and discrimination) are important means of investigating examination quality. However, most item property guidelines were developed for use on large cohorts of examinees; little empirical work has investigated the suitability of applying guidelines to item difficulty and discrimination coefficients estimated for small cohorts, such as those in medical education. We investigated the extent to which item properties vary across multiple clerkship cohorts to better understand the appropriateness of using such guidelines with small cohorts.

METHODS

Exam results for 32 items from an MCQ exam were used. Item discrimination and difficulty coefficients were calculated for 22 cohorts (n = 10-15 students). Discrimination coefficients were categorized according to Ebel and Frisbie (1991). Difficulty coefficients were categorized according to three guidelines by Laveault and Grégoire (2014). Descriptive analyses examined variance in item properties across cohorts.

RESULTS

A large amount of variance in item properties was found across cohorts. Discrimination coefficients for items varied greatly across cohorts, with 29/32 (91%) of items occurring in both Ebel and Frisbie's 'poor' and 'excellent' categories and 19/32 (59%) of items occurring in all five categories. For item difficulty coefficients, the application of different guidelines resulted in large variations in examination length (number of items removed ranged from 0 to 22).

DISCUSSION

While the psychometric properties of items can provide information on item and exam quality, they vary greatly in small cohorts. The application of guidelines with small exam cohorts should be approached with caution.

摘要

引言

多项选择题是医学教育评估的基石。监测题目属性(难度和区分度)是调查考试质量的重要手段。然而,大多数题目属性指南是为大量考生群体设计的;很少有实证研究探讨将这些指南应用于小群体(如医学教育中的群体)估计的题目难度和区分度系数的适用性。我们调查了题目属性在多个临床实习群体中的差异程度,以更好地理解对小群体使用此类指南的适当性。

方法

使用了一份多项选择题考试中32道题目的考试结果。计算了22个群体(n = 10 - 15名学生)的题目区分度和难度系数。区分度系数根据埃贝尔和弗里斯比(1991年)进行分类。难度系数根据拉韦奥和格雷瓜尔(2014年)的三项指南进行分类。描述性分析检查了各群体间题目属性的差异。

结果

各群体间题目属性存在大量差异。题目区分度系数在各群体间差异很大,32道题中有29道(91%)出现在埃贝尔和弗里斯比的“差”和“优”类别中,32道题中有19道(59%)出现在所有五个类别中。对于题目难度系数,不同指南的应用导致考试长度差异很大(删除的题目数量从0到22不等)。

讨论

虽然题目的心理测量属性可以提供有关题目和考试质量的信息,但在小群体中它们差异很大。对小考试群体应用指南时应谨慎。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9687/5285282/3ac532bc37df/40037_2016_322_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验