McManus I C, Chis Liliana, Fox Ray, Waller Derek, Tang Peter
UCL Medical School, University College London, Gower Street, London WC1E 6BT, UK.
BMC Med Educ. 2014 Sep 26;14:204. doi: 10.1186/1472-6920-14-204.
The MRCP(UK) exam, in 2008 and 2010, changed the standard-setting of its Part 1 and Part 2 examinations from a hybrid Angoff/Hofstee method to statistical equating using Item Response Theory, the reference group being UK graduates. The present paper considers the implementation of the change, the question of whether the pass rate increased amongst non-UK candidates, any possible role of Differential Item Functioning (DIF), and changes in examination predictive validity after the change.
Analysis of data of MRCP(UK) Part 1 exam from 2003 to 2013 and Part 2 exam from 2005 to 2013.
Inspection suggested that Part 1 pass rates were stable after the introduction of statistical equating, but showed greater annual variation probably due to stronger candidates taking the examination earlier. Pass rates seemed to have increased in non-UK graduates after equating was introduced, but was not associated with any changes in DIF after statistical equating. Statistical modelling of the pass rates for non-UK graduates found that pass rates, in both Part 1 and Part 2, were increasing year on year, with the changes probably beginning before the introduction of equating. The predictive validity of Part 1 for Part 2 was higher with statistical equating than with the previous hybrid Angoff/Hofstee method, confirming the utility of IRT-based statistical equating.
Statistical equating was successfully introduced into the MRCP(UK) Part 1 and Part 2 written examinations, resulting in higher predictive validity than the previous Angoff/Hofstee standard setting. Concerns about an artefactual increase in pass rates for non-UK candidates after equating were shown not to be well-founded. Most likely the changes resulted from a genuine increase in candidate ability, albeit for reasons which remain unclear, coupled with a cognitive illusion giving the impression of a step-change immediately after equating began. Statistical equating provides a robust standard-setting method, with a better theoretical foundation than judgemental techniques such as Angoff, and is more straightforward and requires far less examiner time to provide a more valid result. The present study provides a detailed case study of introducing statistical equating, and issues which may need to be considered with its introduction.
英国皇家内科医学院(MRCP(UK))考试在2008年和2010年将其一试和二试的标准设定方法从安格夫/霍夫斯泰混合法改为使用项目反应理论的统计等值法,参考群体为英国毕业生。本文探讨了这一变化的实施情况、非英国考生的通过率是否提高、差异项目功能(DIF)可能发挥的作用以及变化后考试预测效度的改变。
分析2003年至2013年MRCP(UK)一试以及2005年至2013年二试的数据。
检查表明,引入统计等值法后一试的通过率保持稳定,但年度变化更大,可能是由于能力更强的考生更早参加考试。引入等值法后,非英国毕业生的通过率似乎有所提高,但与统计等值后的DIF变化无关。对非英国毕业生通过率的统计建模发现,一试和二试的通过率均逐年上升,这种变化可能在引入等值法之前就已开始。一试对二试的预测效度在采用统计等值法时高于之前的安格夫/霍夫斯泰混合法,证实了基于项目反应理论的统计等值法的效用。
统计等值法已成功引入MRCP(UK)一试和二试笔试中,其预测效度高于之前的安格夫/霍夫斯泰标准设定法。关于等值法实施后非英国考生通过率出现人为提高的担忧并无充分依据。最有可能的情况是,考生能力的真正提高导致了这些变化,尽管原因尚不清楚,同时还存在一种认知错觉,让人觉得在等值法开始实施后立即出现了阶跃变化。统计等值法提供了一种稳健的标准设定方法,其理论基础比安格夫等判断技术更好,而且更直接,所需考官时间更少,能得出更有效的结果。本研究提供了一个引入统计等值法的详细案例研究以及引入时可能需要考虑的问题。