医学院校的临床前测试是否存在性别和种族偏见？一项差异项目功能分析。

Are medical school preclinical tests biased for sex and race? A differential item functioning analysis.

作者信息

Dale Esther Dasari, Abulela Mohammed A A, Jia Hao, Violato Claudio

机构信息

University of Minnesota Medical School, 420 Delaware Street SE, Mayo Building, Minneapolis, MN, 55455, USA.

Department of Educational Psychology, University of Minnesota, Minneapolis, MN, 55455, USA.

出版信息

BMC Med Educ. 2025 Jan 29;25(1):146. doi: 10.1186/s12909-024-06540-6.

DOI:10.1186/s12909-024-06540-6

PMID:39881271

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11780802/

Abstract

BACKGROUND

A common practice in assessment development, fundamental for fairness and consequently the validity of test score interpretations and uses, is to ascertain whether test items function equally across test-taker groups. Accordingly, we conducted differential item functioning (DIF) analysis, a psychometric procedure for detecting potential item bias, for three preclinical medical school foundational courses based on students' sex and race.

METHODS

The sample included 520, 519, and 344 medical students for anatomy, histology, and physiology, respectively, collected from 2018 to 2020. To conduct DIF analysis, we used the Wald test based on the two-parameter logistic model as utilized in the IRTPRO software.

RESULTS

The three assessments had as many as one-fifth of the items that functioned statistically differentially across one or more of the variables sex and race: 10 out of 49 items (20%), six out of 40 items (15%), 5 out of 45 items (11%) showed statistically significant DIF for Anatomy, Histology, and Physiology courses, respectively. Measurement specialists and subject matter experts independently reviewed the items to identify construct-irrelevant factors as potential sources for DIF as demonstrated in Appendix A. Most identified items were generally poorly written or had unclear images.

CONCLUSIONS

The validity of score-based inferences, particularly for group comparisons, requires test items to function equally across test-taker groups. In the present study, we found DIF of some items for sex and race in three content areas. The present approach should be utilized in other medical schools to address the generalizability of the present findings. Item level DIF should also be routinely conducted as part of psychometric analyses for basic sciences courses and other assessments.

CLINICAL TRIAL NUMBER

Not applicable.

摘要

背景

在评估开发中，一项常见做法是确定测试项目在不同考生群体中是否具有同等功能，这对于确保公平性以及测试分数解释和使用的有效性至关重要。因此，我们基于学生的性别和种族，对医学院三个临床前基础课程进行了差异项目功能（DIF）分析，这是一种用于检测潜在项目偏差的心理测量程序。

方法

样本分别包括2018年至2020年收集的520名、519名和344名解剖学、组织学和生理学专业的医学生。为了进行DIF分析，我们使用了IRTPRO软件中基于双参数逻辑模型的Wald检验。

结果

这三项评估中，多达五分之一的项目在性别和种族中的一个或多个变量上存在统计学差异：解剖学课程的49个项目中有10个（20%）、组织学课程的40个项目中有6个（15%）、生理学课程的45个项目中有5个（11%）显示出统计学上显著的DIF。测量专家和学科专家独立审查了这些项目，以确定与结构无关的因素作为DIF的潜在来源，如附录A所示。大多数确定的项目通常编写不佳或图像不清晰。