Suppr超能文献

美国医师执照考试第一步项目的差异项目功能分析

Differential Item Functioning Analysis of United States Medical Licensing Examination Step 1 Items.

作者信息

Rubright Jonathan D, Jodoin Michael, Woodward Stephanie, Barone Michael A

机构信息

J.D. Rubright is vice president, Office of Research Strategy, National Board of Medical Examiners, Philadelphia, Pennsylvania.

M. Jodoin is vice president, United States Medical Licensing Examination, National Board of Medical Examiners, Philadelphia, Pennsylvania.

出版信息

Acad Med. 2022 May 1;97(5):718-722. doi: 10.1097/ACM.0000000000004567. Epub 2022 Apr 27.

Abstract

PURPOSE

Previous studies have examined and identified demographic group score differences on United States Medical Licensing Examination (USMLE) Step examinations. It is necessary to explore potential etiologies of such differences to ensure fairness of examination use. Although score differences are largely explained by preceding academic variables, one potential concern is that item-level bias may be associated with remaining group score differences. The purpose of this 2019-2020 study was to statistically identify and qualitatively review USMLE Step 1 exam questions (items) using differential item functioning (DIF) methodology.

METHOD

Logistic regression DIF was used to identify and classify the effect size of DIF on Step 1 items meeting minimum sample size criteria. After using DIF to flag items statistically, subject matter expert (SME) review was used to identify potential reasons why items may have performed differently between racial and gender groups, including characteristics such as content, format, wording, context, or stimulus materials. USMLE SMEs reviewed items to identify the group difference they believed was present, if any; articulate a rationale behind the group difference; and determine whether that rationale would be considered construct relevant or construct irrelevant.

RESULTS

All identified DIF rationales were relevant to the constructs being assessed and therefore did not reflect item bias. Where SME-generated rationales aligned with statistical differences (flags), they favored self-identified women on items tagged to women's health content categories and were judged to be construct relevant.

CONCLUSIONS

This study did not find evidence to support the hypothesis that group-level performance differences beyond those explained by prior academic performance variables are driven by item-level bias. Health professions examination programs have an obligation to assess for group differences, and when present, investigate to what extent, if any, measurement bias plays a role.

摘要

目的

以往的研究考察并确定了美国医师执照考试(USMLE)各阶段考试在不同人口统计学群体得分上的差异。有必要探究这些差异的潜在病因,以确保考试使用的公平性。尽管得分差异在很大程度上可由先前的学术变量来解释,但一个潜在的担忧是,题目层面的偏差可能与群体得分的剩余差异有关。这项2019 - 2020年研究的目的是使用差异题目功能(DIF)方法,从统计学上识别并定性审查USMLE第一阶段考试的题目。

方法

使用逻辑回归DIF来识别和分类DIF对符合最小样本量标准的第一阶段题目的效应大小。在使用DIF从统计学上标记题目后,由学科专家(SME)进行审查,以确定题目在不同种族和性别群体之间表现不同的潜在原因,包括内容、形式、措辞、语境或刺激材料等特征。USMLE的学科专家审查题目,以识别他们认为存在的群体差异(如有);阐明群体差异背后的基本原理;并确定该基本原理是否被认为与结构相关或与结构无关。

结果

所有识别出的DIF基本原理均与所评估的结构相关,因此并未反映题目偏差。当学科专家提出的基本原理与统计差异(标记)一致时,在标记为女性健康内容类别的题目上,他们支持自我认定的女性,并且被判定为与结构相关。

结论

本研究没有找到证据支持这样的假设,即除了先前学术表现变量所解释的差异之外,群体层面的表现差异是由题目层面的偏差所驱动的。卫生专业考试项目有义务评估群体差异,并且在存在差异时,调查测量偏差在多大程度上(如果有的话)起作用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验