• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

美国医师执照考试第一步项目的差异项目功能分析

Differential Item Functioning Analysis of United States Medical Licensing Examination Step 1 Items.

作者信息

Rubright Jonathan D, Jodoin Michael, Woodward Stephanie, Barone Michael A

机构信息

J.D. Rubright is vice president, Office of Research Strategy, National Board of Medical Examiners, Philadelphia, Pennsylvania.

M. Jodoin is vice president, United States Medical Licensing Examination, National Board of Medical Examiners, Philadelphia, Pennsylvania.

出版信息

Acad Med. 2022 May 1;97(5):718-722. doi: 10.1097/ACM.0000000000004567. Epub 2022 Apr 27.

DOI:10.1097/ACM.0000000000004567
PMID:34907964
Abstract

PURPOSE

Previous studies have examined and identified demographic group score differences on United States Medical Licensing Examination (USMLE) Step examinations. It is necessary to explore potential etiologies of such differences to ensure fairness of examination use. Although score differences are largely explained by preceding academic variables, one potential concern is that item-level bias may be associated with remaining group score differences. The purpose of this 2019-2020 study was to statistically identify and qualitatively review USMLE Step 1 exam questions (items) using differential item functioning (DIF) methodology.

METHOD

Logistic regression DIF was used to identify and classify the effect size of DIF on Step 1 items meeting minimum sample size criteria. After using DIF to flag items statistically, subject matter expert (SME) review was used to identify potential reasons why items may have performed differently between racial and gender groups, including characteristics such as content, format, wording, context, or stimulus materials. USMLE SMEs reviewed items to identify the group difference they believed was present, if any; articulate a rationale behind the group difference; and determine whether that rationale would be considered construct relevant or construct irrelevant.

RESULTS

All identified DIF rationales were relevant to the constructs being assessed and therefore did not reflect item bias. Where SME-generated rationales aligned with statistical differences (flags), they favored self-identified women on items tagged to women's health content categories and were judged to be construct relevant.

CONCLUSIONS

This study did not find evidence to support the hypothesis that group-level performance differences beyond those explained by prior academic performance variables are driven by item-level bias. Health professions examination programs have an obligation to assess for group differences, and when present, investigate to what extent, if any, measurement bias plays a role.

摘要

目的

以往的研究考察并确定了美国医师执照考试(USMLE)各阶段考试在不同人口统计学群体得分上的差异。有必要探究这些差异的潜在病因,以确保考试使用的公平性。尽管得分差异在很大程度上可由先前的学术变量来解释,但一个潜在的担忧是,题目层面的偏差可能与群体得分的剩余差异有关。这项2019 - 2020年研究的目的是使用差异题目功能(DIF)方法,从统计学上识别并定性审查USMLE第一阶段考试的题目。

方法

使用逻辑回归DIF来识别和分类DIF对符合最小样本量标准的第一阶段题目的效应大小。在使用DIF从统计学上标记题目后,由学科专家(SME)进行审查,以确定题目在不同种族和性别群体之间表现不同的潜在原因,包括内容、形式、措辞、语境或刺激材料等特征。USMLE的学科专家审查题目,以识别他们认为存在的群体差异(如有);阐明群体差异背后的基本原理;并确定该基本原理是否被认为与结构相关或与结构无关。

结果

所有识别出的DIF基本原理均与所评估的结构相关,因此并未反映题目偏差。当学科专家提出的基本原理与统计差异(标记)一致时,在标记为女性健康内容类别的题目上,他们支持自我认定的女性,并且被判定为与结构相关。

结论

本研究没有找到证据支持这样的假设,即除了先前学术表现变量所解释的差异之外,群体层面的表现差异是由题目层面的偏差所驱动的。卫生专业考试项目有义务评估群体差异,并且在存在差异时,调查测量偏差在多大程度上(如果有的话)起作用。

相似文献

1
Differential Item Functioning Analysis of United States Medical Licensing Examination Step 1 Items.美国医师执照考试第一步项目的差异项目功能分析
Acad Med. 2022 May 1;97(5):718-722. doi: 10.1097/ACM.0000000000004567. Epub 2022 Apr 27.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment.使用差异项目功能评估高风险研究生基于知识的评估中的潜在偏差。
BMC Med Educ. 2018 Apr 3;18(1):64. doi: 10.1186/s12909-018-1143-0.
4
Examining Demographics, Prior Academic Performance, and United States Medical Licensing Examination Scores.考察人口统计学特征、既往学业表现和美国医师执照考试成绩。
Acad Med. 2019 Mar;94(3):364-370. doi: 10.1097/ACM.0000000000002366.
5
Test bias in a cognitive test: differential item functioning in the CASI.认知测试中的测试偏差:认知能力筛查量表中的项目功能差异
Stat Med. 2004 Jan 30;23(2):241-56. doi: 10.1002/sim.1713.
6
The National Council Licensure Examinations/differential item functioning process.国家委员会执照考试/差异项目功能过程。
J Nurs Educ. 2000 Apr;39(4):185-7. doi: 10.3928/0148-4834-20000401-10.
7
Exploring differential item functioning (DIF) with the Rasch model: a comparison of gender differences on eighth grade science items in the United States and Spain.使用拉施模型探索项目功能差异(DIF):美国和西班牙八年级科学项目中性别差异的比较。
J Appl Meas. 2011;12(2):144-64.
8
Measurement Equivalence of the Patient Reported Outcomes Measurement Information System (PROMIS) Anxiety Short Forms in Ethnically Diverse Groups.患者报告结局测量信息系统(PROMIS)焦虑简表在不同种族群体中的测量等效性
Psychol Test Assess Model. 2016;58(1):183-219.
9
Examination of the Measurement Equivalence of the Functional Assessment in Acute Care MCAT (FAMCAT) Mobility Item Bank Using Differential Item Functioning Analyses.使用差异项目功能分析检验急性护理 MCAT(FAMCAT)移动项目库中功能评估的测量等效性。
Arch Phys Med Rehabil. 2022 May;103(5S):S84-S107.e38. doi: 10.1016/j.apmr.2021.03.044. Epub 2021 Jun 16.
10
Modern psychometric methods for detection of differential item functioning: application to cognitive assessment measures.用于检测项目功能差异的现代心理测量方法:在认知评估测量中的应用。
Stat Med. 2000;19(11-12):1651-83. doi: 10.1002/(sici)1097-0258(20000615/30)19:11/12<1651::aid-sim453>3.0.co;2-h.

引用本文的文献

1
Are medical school preclinical tests biased for sex and race? A differential item functioning analysis.医学院校的临床前测试是否存在性别和种族偏见?一项差异项目功能分析。
BMC Med Educ. 2025 Jan 29;25(1):146. doi: 10.1186/s12909-024-06540-6.
2
HIGH-STAKES KNOWLEDGE ASSESSMENT AT ABFM: WHAT WE HAVE LEARNED AND HOW IT IS USEFUL.美国全科医学委员会的高风险知识评估:我们学到了什么以及它的用途
Ann Fam Med. 2022 Mar-Apr;20(2):186-188. doi: 10.1370/afm.2811.