• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用差异项目功能评估高风险研究生基于知识的评估中的潜在偏差。

Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment.

机构信息

Centre for Medical Education, The Chancellor's Building, College of Medicine and Veterinary Medicine, The University of Edinburgh, 49 Little France Crescent, Edinburgh, Scotland, EH16 4SB, UK.

Medical Unit, St John's Hospital, Livingston, Scotland, EH54 6PP, UK.

出版信息

BMC Med Educ. 2018 Apr 3;18(1):64. doi: 10.1186/s12909-018-1143-0.

DOI:10.1186/s12909-018-1143-0
PMID:29615016
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5883583/
Abstract

BACKGROUND

Fairness is a critical component of defensible assessment. Candidates should perform according to ability without influence from background characteristics such as ethnicity or sex. However, performance differs by candidate background in many assessment environments. Many potential causes of such differences exist, and examinations must be routinely analysed to ensure they do not present inappropriate progression barriers for any candidate group. By analysing the individual questions of an examination through techniques such as Differential Item Functioning (DIF), we can test whether a subset of unfair questions explains group-level differences. Such items can then be revised or removed.

METHODS

We used DIF to investigate fairness for 13,694 candidates sitting a major international summative postgraduate examination in internal medicine. We compared (a) ethnically white UK graduates against ethnically non-white UK graduates and (b) male UK graduates against female UK graduates. DIF was used to test 2773 questions across 14 sittings.

RESULTS

Across 2773 questions eight (0.29%) showed notable DIF after correcting for multiple comparisons: seven medium effects and one large effect. Blinded analysis of these questions by a panel of clinician assessors identified no plausible explanations for the differences. These questions were removed from the question bank and we present them here to share knowledge of questions with DIF. These questions did not significantly impact the overall performance of the cohort. Group-level differences in performance between the groups we studied in this examination cannot be explained by a subset of unfair questions.

CONCLUSIONS

DIF helps explore fairness in assessment at the question level. This is especially important in high-stakes assessment where a small number of unfair questions may adversely impact the passing rates of some groups. However, very few questions exhibited notable DIF so differences in passing rates for the groups we studied cannot be explained by unfairness at the question level.

摘要

背景

公平性是可辩护评估的关键组成部分。候选人应根据能力表现,不受种族或性别等背景特征的影响。然而,在许多评估环境中,候选人的背景会导致表现存在差异。存在许多导致这种差异的潜在原因,必须定期对考试进行分析,以确保其不会对任何候选人群体造成不适当的晋升障碍。通过使用差分项目功能(DIF)等技术分析考试的个别问题,我们可以检验是否存在一组不公平问题可以解释群体水平的差异。然后可以修改或删除这些项目。

方法

我们使用 DIF 分析了 13694 名参加大型国际内科研究生总结性考试的考生的公平性。我们比较了(a)具有英国白人种族背景的毕业生与非白人种族的英国毕业生,以及(b)英国男性毕业生与女性毕业生。在 14 次考试中,DIF 用于测试 2773 个问题。

结果

在 2773 个问题中,有 8 个(0.29%)在经过多次比较校正后显示出明显的 DIF:7 个中等效应和 1 个大效应。一组临床评估员对这些问题进行的盲法分析没有发现差异的合理解释。这些问题已从题库中删除,我们在此处呈现这些问题,以分享具有 DIF 的问题的知识。这些问题并没有显著影响考生的整体表现。我们在这次考试中研究的群体之间的表现水平差异不能用一组不公平的问题来解释。

结论

DIF 有助于在考试的问题层面探索评估的公平性。在高风险评估中,少量不公平问题可能会对某些群体的及格率产生不利影响,因此这一点尤为重要。然而,只有极少数问题表现出明显的 DIF,因此我们研究的群体的及格率差异不能用问题层面的不公平来解释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7fb/5883583/81469588b06d/12909_2018_1143_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7fb/5883583/81469588b06d/12909_2018_1143_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7fb/5883583/81469588b06d/12909_2018_1143_Fig1_HTML.jpg

相似文献

1
Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment.使用差异项目功能评估高风险研究生基于知识的评估中的潜在偏差。
BMC Med Educ. 2018 Apr 3;18(1):64. doi: 10.1186/s12909-018-1143-0.
2
Academic performance of ethnic minority candidates and discrimination in the MRCGP examinations between 2010 and 2012: analysis of data.2010 年至 2012 年 MRCGP 考试中少数民族考生的学业表现与歧视:数据分析。
BMJ. 2013 Sep 26;347:f5662. doi: 10.1136/bmj.f5662.
3
The American Board of Family Medicine's 8 Years of Experience with Differential Item Functioning.美国家庭医学委员会在题目功能差异方面的 8 年经验。
J Am Board Fam Med. 2022 Jan-Feb;35(1):18-25. doi: 10.3122/jabfm.2022.01.210208.
4
Validating a multiple mini-interview question bank assessing entry-level reasoning skills in candidates for graduate-entry medicine and dentistry programmes.验证一个多迷你面试题库,该题库用于评估申请研究生入学医学和牙科学项目的考生的入门级推理能力。
Med Educ. 2009 Apr;43(4):350-9. doi: 10.1111/j.1365-2923.2009.03292.x.
5
Differential Item Functioning Analysis of United States Medical Licensing Examination Step 1 Items.美国医师执照考试第一步项目的差异项目功能分析
Acad Med. 2022 May 1;97(5):718-722. doi: 10.1097/ACM.0000000000004567. Epub 2022 Apr 27.
6
Investigating possible ethnicity and sex bias in clinical examiners: an analysis of data from the MRCP(UK) PACES and nPACES examinations.探讨临床考官中可能存在的种族和性别偏见:对 MRCP(UK)PACES 和 nPACES 考试数据的分析。
BMC Med Educ. 2013 Jul 30;13:103. doi: 10.1186/1472-6920-13-103.
7
Cross-comparison of MRCGP & MRCP(UK) in a database linkage study of 2,284 candidates taking both examinations: assessment of validity and differential performance by ethnicity.在一项对2284名同时参加两项考试的考生进行的数据库关联研究中对MRCGP和MRCP(UK)进行交叉比较:按种族评估有效性和差异表现。
BMC Med Educ. 2015 Jan 16;15:1. doi: 10.1186/s12909-014-0281-2.
8
Passing MRCP (UK) PACES: a cross-sectional study examining the performance of doctors by sex and country.通过 MRCP(英国)PACES:一项按性别和国家划分的医生表现的横断面研究。
BMC Med Educ. 2018 Apr 6;18(1):70. doi: 10.1186/s12909-018-1178-2.
9
Measurement Equivalence of the Patient Reported Outcomes Measurement Information System (PROMIS) Anxiety Short Forms in Ethnically Diverse Groups.患者报告结局测量信息系统(PROMIS)焦虑简表在不同种族群体中的测量等效性
Psychol Test Assess Model. 2016;58(1):183-219.
10
Performance in the MRCP(UK) Examination 2003-4: analysis of pass rates of UK graduates in relation to self-declared ethnicity and gender.2003 - 2004年英国皇家内科医师学会会员资格考试成绩:英国毕业生及格率与自我申报的种族和性别的关系分析。
BMC Med. 2007 May 3;5:8. doi: 10.1186/1741-7015-5-8.

引用本文的文献

1
Differential attainment at national selection for higher surgical training: a retrospective cohort study.高等外科培训全国选拔中的差异成就:一项回顾性队列研究。
BMJ Open. 2025 Jan 28;15(1):e091796. doi: 10.1136/bmjopen-2024-091796.
2
Big data analysis: examination of the relationship between candidates' sociodemographic characteristics and performance in the UK's Membership of the Royal College of Physicians Part 1 examination.大数据分析:英国皇家内科医师学院第一部分考试中考生社会人口学特征与考试成绩之间的关系研究
Adv Health Sci Educ Theory Pract. 2025 Feb;30(1):53-68. doi: 10.1007/s10459-024-10406-3. Epub 2024 Dec 20.
3

本文引用的文献

1
The association between trainee demographic factors and self-reported experience: Analysis of General Medical Council National Training Survey 2014 and 2015 data.实习医生人口统计学因素与自我报告经验之间的关联:对英国医学总会2014年和2015年全国培训调查数据的分析。
JRSM Open. 2016 Mar 3;7(4):2054270416632705. doi: 10.1177/2054270416632705. eCollection 2016 Apr.
2
Equality, diversity and fairness in medical education: international perspectives.医学教育中的平等、多样性与公平:国际视角
Med Educ. 2015 Jan;49(1):4-6. doi: 10.1111/medu.12601.
3
Moving beyond childish notions of fair and equitable.
Exploring the use of Rasch modelling in "common content" items for multi-site and multi-year assessment.
探索拉施模型在多站点和多年评估的“通用内容”项目中的应用。
Adv Health Sci Educ Theory Pract. 2025 Apr;30(2):427-438. doi: 10.1007/s10459-024-10354-y. Epub 2024 Jul 8.
4
Differential attainment in assessment of postgraduate surgical trainees: a scoping review.研究生外科培训生评估中的差异表现:范围综述。
BMC Med Educ. 2024 May 30;24(1):597. doi: 10.1186/s12909-024-05580-2.
5
Differential Performance of Social Communication Questionnaire Items in African American/Black vs. White Children.非裔美国/黑人儿童与白人儿童在社会沟通问卷项目上的差异表现
J Autism Dev Disord. 2024 May;54(5):1820-1833. doi: 10.1007/s10803-023-05931-w. Epub 2023 Mar 10.
6
Does performance at the intercollegiate Membership of the Royal Colleges of Surgeons (MRCS) examination vary according to UK medical school and course type? A retrospective cohort study.英国皇家外科学院联合会员(MRCS)考试成绩是否因英国医学院校和课程类型而异?一项回顾性队列研究。
BMJ Open. 2022 Jan 5;12(1):e054616. doi: 10.1136/bmjopen-2021-054616.
7
The do's, don'ts and don't knows of redressing differential attainment related to race/ethnicity in medical schools.纠正医学院种族/民族相关差异的方法、禁忌和未知因素。
Perspect Med Educ. 2022 Jan;11(1):1-14. doi: 10.1007/s40037-021-00696-3. Epub 2021 Dec 29.
8
Measuring differential attainment: a longitudinal analysis of assessment results for 1512 medical students at four Scottish medical schools.衡量差异成就:对苏格兰四所医学院 1512 名医学生评估结果的纵向分析。
BMJ Open. 2021 Sep 3;11(9):e046056. doi: 10.1136/bmjopen-2020-046056.
9
Investigating possible causes of bias in a progress test translation: an one-edged sword.探究进展测试翻译中可能存在的偏差原因:一把双刃剑。
Korean J Med Educ. 2019 Sep;31(3):193-204. doi: 10.3946/kjme.2019.130. Epub 2019 Aug 26.
10
Fitness to practise sanctions in UK doctors are predicted by poor performance at MRCGP and MRCP(UK) assessments: data linkage study.英国医生的行医能力制裁预测依据为 MRCGP 和 MRCP(UK)评估中的表现不佳:数据关联研究。
BMC Med. 2018 Dec 7;16(1):230. doi: 10.1186/s12916-018-1214-4.
超越关于公平和平等的幼稚观念。
Med Educ. 2015 Jan;49(1):1-3. doi: 10.1111/medu.12640.
4
Implementing statistical equating for MRCP(UK) Parts 1 and 2.为英国皇家内科医学院会员考试第1部分和第2部分实施统计等值法。
BMC Med Educ. 2014 Sep 26;14:204. doi: 10.1186/1472-6920-14-204.
5
MRCGP CSA: are the examiners biased, favouring their own by sex, ethnicity, and degree source?MRCGP CSA:考官是否存在偏见,偏向于自己的性别、种族和学位来源?
Br J Gen Pract. 2013 Nov;63(616):e718-25. doi: 10.3399/bjgp13X674396.
6
Investigating possible ethnicity and sex bias in clinical examiners: an analysis of data from the MRCP(UK) PACES and nPACES examinations.探讨临床考官中可能存在的种族和性别偏见:对 MRCP(UK)PACES 和 nPACES 考试数据的分析。
BMC Med Educ. 2013 Jul 30;13:103. doi: 10.1186/1472-6920-13-103.
7
lordif: An R Package for Detecting Differential Item Functioning Using Iterative Hybrid Ordinal Logistic Regression/Item Response Theory and Monte Carlo Simulations.lordif:一个用于使用迭代混合有序逻辑回归/项目反应理论和蒙特卡罗模拟检测项目功能差异的R包。
J Stat Softw. 2011 Mar 1;39(8):1-30. doi: 10.18637/jss.v039.i08.
8
Ethnicity and academic performance in UK trained doctors and medical students: systematic review and meta-analysis.英国培养的医生和医学生的种族与学业表现:系统回顾和荟萃分析。
BMJ. 2011 Mar 8;342:d901. doi: 10.1136/bmj.d901.
9
A general framework and an R package for the detection of dichotomous differential item functioning.一种用于检测二分类差异项目功能的通用框架和 R 包。
Behav Res Methods. 2010 Aug;42(3):847-62. doi: 10.3758/BRM.42.3.847.
10
Evidence of gender bias in True-False-Abstain medical examinations.医学考试中存在的真-假-弃权型试题的性别偏见证据。
BMC Med Educ. 2009 Jun 7;9:32. doi: 10.1186/1472-6920-9-32.