Laukaityte Inga, Wallin Gabriel, Wiberg Marie
Department of Applied Educational Science, Umeå University, Sweden.
School of Mathematical Sciences, Lancaster University, UK.
Appl Psychol Meas. 2025 Jul 30:01466216251363240. doi: 10.1177/01466216251363240.
Ensuring that test scores are fair and comparable across different test forms and different test groups is a significant statistical challenge in educational testing. Methods to achieve score comparability, a process known as test score equating, often rely on including common test items or assuming that test taker groups are similar in key characteristics. This study explores a novel approach that combines propensity scores, based on test takers' background covariates, with information from common items using kernel smoothing techniques for binary-scored test items. An empirical analysis using data from a high-stakes college admissions test evaluates the standard errors and differences in adjusted test scores. A simulation study examines the impact of factors such as the number of test takers, the number of common items, and the correlation between covariates and test scores on the method's performance. The findings demonstrate that integrating propensity scores with common item information reduces standard errors and bias more effectively than using either source alone. This suggests that balancing the groups on the test-takers' covariates enhance the fairness and accuracy of test score comparisons across different groups. The proposed method highlights the benefits of considering all the collected data to improve score comparability.
确保不同测试形式和不同测试群体的考试分数公平且具有可比性,是教育测试中一项重大的统计挑战。实现分数可比性的方法,即所谓的考试分数等值过程,通常依赖于纳入共同测试项目或假设考生群体在关键特征上相似。本研究探索了一种新颖的方法,该方法将基于考生背景协变量的倾向得分与来自共同项目的信息相结合,对二元计分的测试项目使用核平滑技术。一项使用来自高风险大学入学考试数据的实证分析评估了调整后考试分数的标准误差和差异。一项模拟研究考察了考生数量、共同项目数量以及协变量与考试分数之间的相关性等因素对该方法性能的影响。研究结果表明,将倾向得分与共同项目信息相结合,比单独使用任何一种信息来源更有效地降低了标准误差和偏差。这表明在考生协变量上平衡群体,可提高不同群体间考试分数比较的公平性和准确性。所提出的方法凸显了考虑所有收集到的数据以提高分数可比性的益处。