Suppr超能文献

使用差异项目功能评估高风险研究生基于知识的评估中的潜在偏差。

Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment.

机构信息

Centre for Medical Education, The Chancellor's Building, College of Medicine and Veterinary Medicine, The University of Edinburgh, 49 Little France Crescent, Edinburgh, Scotland, EH16 4SB, UK.

Medical Unit, St John's Hospital, Livingston, Scotland, EH54 6PP, UK.

出版信息

BMC Med Educ. 2018 Apr 3;18(1):64. doi: 10.1186/s12909-018-1143-0.

Abstract

BACKGROUND

Fairness is a critical component of defensible assessment. Candidates should perform according to ability without influence from background characteristics such as ethnicity or sex. However, performance differs by candidate background in many assessment environments. Many potential causes of such differences exist, and examinations must be routinely analysed to ensure they do not present inappropriate progression barriers for any candidate group. By analysing the individual questions of an examination through techniques such as Differential Item Functioning (DIF), we can test whether a subset of unfair questions explains group-level differences. Such items can then be revised or removed.

METHODS

We used DIF to investigate fairness for 13,694 candidates sitting a major international summative postgraduate examination in internal medicine. We compared (a) ethnically white UK graduates against ethnically non-white UK graduates and (b) male UK graduates against female UK graduates. DIF was used to test 2773 questions across 14 sittings.

RESULTS

Across 2773 questions eight (0.29%) showed notable DIF after correcting for multiple comparisons: seven medium effects and one large effect. Blinded analysis of these questions by a panel of clinician assessors identified no plausible explanations for the differences. These questions were removed from the question bank and we present them here to share knowledge of questions with DIF. These questions did not significantly impact the overall performance of the cohort. Group-level differences in performance between the groups we studied in this examination cannot be explained by a subset of unfair questions.

CONCLUSIONS

DIF helps explore fairness in assessment at the question level. This is especially important in high-stakes assessment where a small number of unfair questions may adversely impact the passing rates of some groups. However, very few questions exhibited notable DIF so differences in passing rates for the groups we studied cannot be explained by unfairness at the question level.

摘要

背景

公平性是可辩护评估的关键组成部分。候选人应根据能力表现,不受种族或性别等背景特征的影响。然而,在许多评估环境中,候选人的背景会导致表现存在差异。存在许多导致这种差异的潜在原因,必须定期对考试进行分析,以确保其不会对任何候选人群体造成不适当的晋升障碍。通过使用差分项目功能(DIF)等技术分析考试的个别问题,我们可以检验是否存在一组不公平问题可以解释群体水平的差异。然后可以修改或删除这些项目。

方法

我们使用 DIF 分析了 13694 名参加大型国际内科研究生总结性考试的考生的公平性。我们比较了(a)具有英国白人种族背景的毕业生与非白人种族的英国毕业生,以及(b)英国男性毕业生与女性毕业生。在 14 次考试中,DIF 用于测试 2773 个问题。

结果

在 2773 个问题中,有 8 个(0.29%)在经过多次比较校正后显示出明显的 DIF:7 个中等效应和 1 个大效应。一组临床评估员对这些问题进行的盲法分析没有发现差异的合理解释。这些问题已从题库中删除,我们在此处呈现这些问题,以分享具有 DIF 的问题的知识。这些问题并没有显著影响考生的整体表现。我们在这次考试中研究的群体之间的表现水平差异不能用一组不公平的问题来解释。

结论

DIF 有助于在考试的问题层面探索评估的公平性。在高风险评估中,少量不公平问题可能会对某些群体的及格率产生不利影响,因此这一点尤为重要。然而,只有极少数问题表现出明显的 DIF,因此我们研究的群体的及格率差异不能用问题层面的不公平来解释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7fb/5883583/81469588b06d/12909_2018_1143_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验