• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

为英国皇家内科医学院会员考试第1部分和第2部分实施统计等值法。

Implementing statistical equating for MRCP(UK) Parts 1 and 2.

作者信息

McManus I C, Chis Liliana, Fox Ray, Waller Derek, Tang Peter

机构信息

UCL Medical School, University College London, Gower Street, London WC1E 6BT, UK.

出版信息

BMC Med Educ. 2014 Sep 26;14:204. doi: 10.1186/1472-6920-14-204.

DOI:10.1186/1472-6920-14-204
PMID:25257070
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4182791/
Abstract

BACKGROUND

The MRCP(UK) exam, in 2008 and 2010, changed the standard-setting of its Part 1 and Part 2 examinations from a hybrid Angoff/Hofstee method to statistical equating using Item Response Theory, the reference group being UK graduates. The present paper considers the implementation of the change, the question of whether the pass rate increased amongst non-UK candidates, any possible role of Differential Item Functioning (DIF), and changes in examination predictive validity after the change.

METHODS

Analysis of data of MRCP(UK) Part 1 exam from 2003 to 2013 and Part 2 exam from 2005 to 2013.

RESULTS

Inspection suggested that Part 1 pass rates were stable after the introduction of statistical equating, but showed greater annual variation probably due to stronger candidates taking the examination earlier. Pass rates seemed to have increased in non-UK graduates after equating was introduced, but was not associated with any changes in DIF after statistical equating. Statistical modelling of the pass rates for non-UK graduates found that pass rates, in both Part 1 and Part 2, were increasing year on year, with the changes probably beginning before the introduction of equating. The predictive validity of Part 1 for Part 2 was higher with statistical equating than with the previous hybrid Angoff/Hofstee method, confirming the utility of IRT-based statistical equating.

CONCLUSIONS

Statistical equating was successfully introduced into the MRCP(UK) Part 1 and Part 2 written examinations, resulting in higher predictive validity than the previous Angoff/Hofstee standard setting. Concerns about an artefactual increase in pass rates for non-UK candidates after equating were shown not to be well-founded. Most likely the changes resulted from a genuine increase in candidate ability, albeit for reasons which remain unclear, coupled with a cognitive illusion giving the impression of a step-change immediately after equating began. Statistical equating provides a robust standard-setting method, with a better theoretical foundation than judgemental techniques such as Angoff, and is more straightforward and requires far less examiner time to provide a more valid result. The present study provides a detailed case study of introducing statistical equating, and issues which may need to be considered with its introduction.

摘要

背景

英国皇家内科医学院(MRCP(UK))考试在2008年和2010年将其一试和二试的标准设定方法从安格夫/霍夫斯泰混合法改为使用项目反应理论的统计等值法,参考群体为英国毕业生。本文探讨了这一变化的实施情况、非英国考生的通过率是否提高、差异项目功能(DIF)可能发挥的作用以及变化后考试预测效度的改变。

方法

分析2003年至2013年MRCP(UK)一试以及2005年至2013年二试的数据。

结果

检查表明,引入统计等值法后一试的通过率保持稳定,但年度变化更大,可能是由于能力更强的考生更早参加考试。引入等值法后,非英国毕业生的通过率似乎有所提高,但与统计等值后的DIF变化无关。对非英国毕业生通过率的统计建模发现,一试和二试的通过率均逐年上升,这种变化可能在引入等值法之前就已开始。一试对二试的预测效度在采用统计等值法时高于之前的安格夫/霍夫斯泰混合法,证实了基于项目反应理论的统计等值法的效用。

结论

统计等值法已成功引入MRCP(UK)一试和二试笔试中,其预测效度高于之前的安格夫/霍夫斯泰标准设定法。关于等值法实施后非英国考生通过率出现人为提高的担忧并无充分依据。最有可能的情况是,考生能力的真正提高导致了这些变化,尽管原因尚不清楚,同时还存在一种认知错觉,让人觉得在等值法开始实施后立即出现了阶跃变化。统计等值法提供了一种稳健的标准设定方法,其理论基础比安格夫等判断技术更好,而且更直接,所需考官时间更少,能得出更有效的结果。本研究提供了一个引入统计等值法的详细案例研究以及引入时可能需要考虑的问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e3b/4182791/aaebad6287b7/12909_2014_1029_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e3b/4182791/f71055ac2ca4/12909_2014_1029_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e3b/4182791/9f0f03cc1d3b/12909_2014_1029_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e3b/4182791/4c1b2fde66d6/12909_2014_1029_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e3b/4182791/3fd11a0c4835/12909_2014_1029_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e3b/4182791/a5f8a4b01da9/12909_2014_1029_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e3b/4182791/551eb7477536/12909_2014_1029_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e3b/4182791/e788ff4cde3f/12909_2014_1029_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e3b/4182791/914b38b1c563/12909_2014_1029_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e3b/4182791/aaebad6287b7/12909_2014_1029_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e3b/4182791/f71055ac2ca4/12909_2014_1029_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e3b/4182791/9f0f03cc1d3b/12909_2014_1029_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e3b/4182791/4c1b2fde66d6/12909_2014_1029_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e3b/4182791/3fd11a0c4835/12909_2014_1029_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e3b/4182791/a5f8a4b01da9/12909_2014_1029_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e3b/4182791/551eb7477536/12909_2014_1029_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e3b/4182791/e788ff4cde3f/12909_2014_1029_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e3b/4182791/914b38b1c563/12909_2014_1029_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e3b/4182791/aaebad6287b7/12909_2014_1029_Fig9_HTML.jpg

相似文献

1
Implementing statistical equating for MRCP(UK) Parts 1 and 2.为英国皇家内科医学院会员考试第1部分和第2部分实施统计等值法。
BMC Med Educ. 2014 Sep 26;14:204. doi: 10.1186/1472-6920-14-204.
2
PLAB and UK graduates' performance on MRCP(UK) and MRCGP examinations: data linkage study.PLAB 和英国毕业生在 MRCP(UK) 和 MRCGP 考试中的表现:数据链接研究。
BMJ. 2014 Apr 17;348:g2621. doi: 10.1136/bmj.g2621.
3
Investigating possible ethnicity and sex bias in clinical examiners: an analysis of data from the MRCP(UK) PACES and nPACES examinations.探讨临床考官中可能存在的种族和性别偏见:对 MRCP(UK)PACES 和 nPACES 考试数据的分析。
BMC Med Educ. 2013 Jul 30;13:103. doi: 10.1186/1472-6920-13-103.
4
Performance at MRCP(UK): when should trainees sit examinations?MRCP(UK) 考试表现:何时应允许学员参加考试?
Clin Med (Lond). 2013 Apr;13(2):166-9. doi: 10.7861/clinmedicine.13-2-166.
5
Changes in standard of candidates taking the MRCP(UK) Part 1 examination, 1985 to 2002: analysis of marker questions.1985年至2002年参加英国皇家内科医师学会(MRCP)第一部分考试考生水平的变化:标记问题分析
BMC Med. 2005 Jul 18;3:13. doi: 10.1186/1741-7015-3-13.
6
Resitting a high-stakes postgraduate medical examination on multiple occasions: nonlinear multilevel modelling of performance in the MRCP(UK) examinations.多次重考高风险研究生医学考试:MRCP(UK)考试表现的非线性多层级建模。
BMC Med. 2012 Jun 14;10:60. doi: 10.1186/1741-7015-10-60.
7
Analysis of predictors of success in the MRCP (UK) PACES examination in candidates attending a revision course.参加复习课程的考生在英国皇家内科医师学会(MRCP)实践技能考试(PACES)中的成功预测因素分析。
Postgrad Med J. 2006 Feb;82(964):145-9. doi: 10.1136/pmj.2005.035998.
8
The standard error of measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP(UK) examinations.测量标准误差是研究生医学评估中比可靠性更合适的质量衡量标准:对 MRCP(UK)考试的分析。
BMC Med Educ. 2010 Jun 2;10:40. doi: 10.1186/1472-6920-10-40.
9
Performance in the MRCP(UK) Examination 2003-4: analysis of pass rates of UK graduates in relation to self-declared ethnicity and gender.2003 - 2004年英国皇家内科医师学会会员资格考试成绩:英国毕业生及格率与自我申报的种族和性别的关系分析。
BMC Med. 2007 May 3;5:8. doi: 10.1186/1741-7015-5-8.
10
Annual Review of Competence Progression (ARCP) performance of doctors who passed Professional and Linguistic Assessments Board (PLAB) tests compared with UK medical graduates: national data linkage study.通过专业和语言评估委员会 (PLAB) 考试的医生与英国医学毕业生的年度能力进展评估 (ARCP) 绩效比较:全国数据链接研究。
BMJ. 2014 Apr 17;348:g2622. doi: 10.1136/bmj.g2622.

引用本文的文献

1
Exploring the use of Rasch modelling in "common content" items for multi-site and multi-year assessment.探索拉施模型在多站点和多年评估的“通用内容”项目中的应用。
Adv Health Sci Educ Theory Pract. 2025 Apr;30(2):427-438. doi: 10.1007/s10459-024-10354-y. Epub 2024 Jul 8.
2
Predictive validity of A-level grades and teacher-predicted grades in UK medical school applicants: a retrospective analysis of administrative data in a time of COVID-19.在 COVID-19 时期,对英国医学院申请者的 A-level 成绩和教师预测成绩的预测效度进行回顾性分析:基于行政数据的研究
BMJ Open. 2021 Dec 16;11(12):e047354. doi: 10.1136/bmjopen-2020-047354.
3

本文引用的文献

1
The relationship between licensing examination performance and the outcomes of care by international medical school graduates.国际医学院校毕业生执照考试表现与医疗结果的关系。
Acad Med. 2014 Aug;89(8):1157-62. doi: 10.1097/ACM.0000000000000310.
2
Resitting a high-stakes postgraduate medical examination on multiple occasions: nonlinear multilevel modelling of performance in the MRCP(UK) examinations.多次重考高风险研究生医学考试:MRCP(UK)考试表现的非线性多层级建模。
BMC Med. 2012 Jun 14;10:60. doi: 10.1186/1741-7015-10-60.
3
Test equating of the Medical Licensing Examination in 2003 and 2004 based on the item response theory.
Fitness to practise sanctions in UK doctors are predicted by poor performance at MRCGP and MRCP(UK) assessments: data linkage study.
英国医生的行医能力制裁预测依据为 MRCGP 和 MRCP(UK)评估中的表现不佳:数据关联研究。
BMC Med. 2018 Dec 7;16(1):230. doi: 10.1186/s12916-018-1214-4.
4
Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment.使用差异项目功能评估高风险研究生基于知识的评估中的潜在偏差。
BMC Med Educ. 2018 Apr 3;18(1):64. doi: 10.1186/s12909-018-1143-0.
5
Are Exam Questions Known in Advance? Using Local Dependence to Detect Cheating.考试题目会提前泄露吗?利用局部相依性来检测作弊行为。
PLoS One. 2016 Dec 1;11(12):e0167545. doi: 10.1371/journal.pone.0167545. eCollection 2016.
6
Cross-comparison of MRCGP & MRCP(UK) in a database linkage study of 2,284 candidates taking both examinations: assessment of validity and differential performance by ethnicity.在一项对2284名同时参加两项考试的考生进行的数据库关联研究中对MRCGP和MRCP(UK)进行交叉比较:按种族评估有效性和差异表现。
BMC Med Educ. 2015 Jan 16;15:1. doi: 10.1186/s12909-014-0281-2.
基于项目反应理论的2003年和2004年医学执照考试的测验等值
J Educ Eval Health Prof. 2006;3:2. doi: 10.3352/jeehp.2006.3.2. Epub 2006 Jul 8.
4
Is an Angoff standard an indication of minimal competence of examinees or of judges?安格夫标准是考生最低能力的指标还是评判者最低能力的指标?
Adv Health Sci Educ Theory Pract. 2008 May;13(2):203-11. doi: 10.1007/s10459-006-9035-1. Epub 2006 Oct 17.
5
Procedures for establishing defensible absolute passing scores on performance examinations in health professions education.在健康职业教育中为能力考核设定可靠绝对及格分数的程序。
Teach Learn Med. 2006 Winter;18(1):50-7. doi: 10.1207/s15328015tlm1801_11.
6
Changes in standard of candidates taking the MRCP(UK) Part 1 examination, 1985 to 2002: analysis of marker questions.1985年至2002年参加英国皇家内科医师学会(MRCP)第一部分考试考生水平的变化:标记问题分析
BMC Med. 2005 Jul 18;3:13. doi: 10.1186/1741-7015-3-13.
7
Test equating in the presence of DIF items.存在差异项目功能(DIF)时的测验等值
J Appl Meas. 2005;6(3):342-54.
8
Item response theory: applications of modern test theory in medical education.项目反应理论:现代测试理论在医学教育中的应用。
Med Educ. 2003 Aug;37(8):739-45. doi: 10.1046/j.1365-2923.2003.01587.x.
9
Standard setting in medical education.医学教育中的标准设定
Acad Med. 1996 Oct;71(10 Suppl):S112-20. doi: 10.1097/00001888-199610000-00062.
10
Evolution of an examination: M.R.C.P. (U.K.).一项考试的演变:英国皇家内科医师学会会员资格考试
Br Med J. 1974 Apr 13;2(5910):99-107. doi: 10.1136/bmj.2.5910.99.