Babcock Ben, Hodge Kari J
The American Registry of Radiologic Technologists, Saint Paul, MN, USA.
NACE International, Houston, TX, USA.
Educ Psychol Meas. 2020 Jun;80(3):499-521. doi: 10.1177/0013164419878483. Epub 2019 Sep 30.
Equating and scaling in the context of small sample exams, such as credentialing exams for highly specialized professions, has received increased attention in recent research. Investigators have proposed a variety of both classical and Rasch-based approaches to the problem. This study attempts to extend past research by (1) directly comparing classical and Rasch techniques of equating exam scores when sample sizes are small (≤ 100 per exam form) and (2) attempting to pool multiple forms' worth of data to improve estimation in the Rasch framework. We simulated multiple years of a small-sample exam program by resampling from a larger certification exam program's real data. Results showed that combining multiple administrations' worth of data via the Rasch model can lead to more accurate equating compared to classical methods designed to work well in small samples. WINSTEPS-based Rasch methods that used multiple exam forms' data worked better than Bayesian Markov Chain Monte Carlo methods, as the prior distribution used to estimate the item difficulty parameters biased predicted scores when there were difficulty differences between exam forms.
在小规模考试的背景下,如高度专业化职业的资格认证考试,等值化和量表化在最近的研究中受到了越来越多的关注。研究人员针对这一问题提出了多种基于经典方法和拉施模型的途径。本研究试图通过以下方式扩展以往的研究:(1)在样本量较小(每个考试形式≤100)时,直接比较经典和拉施考试分数等值化技术;(2)尝试合并多个考试形式的数据,以改善拉施框架中的估计。我们通过从一个更大的认证考试项目的真实数据中重新抽样,模拟了多年的小规模考试项目。结果表明,与旨在在小样本中表现良好的经典方法相比,通过拉施模型合并多个考试的数据可以带来更准确的等值化。基于WINSTEPS的拉施方法使用多个考试形式的数据比贝叶斯马尔可夫链蒙特卡罗方法效果更好,因为当考试形式之间存在难度差异时,用于估计项目难度参数的先验分布会使预测分数产生偏差。