Wyse Adam E
Renaissance, MN, USA.
Appl Psychol Meas. 2023 Nov;47(7-8):513-525. doi: 10.1177/01466216231209749. Epub 2023 Oct 19.
This study introduces two new statistics for measuring the score comparability of computerized adaptive tests (CATs) based on comparing conditional standard errors of measurement (CSEMs) for examinees that achieved the same scale scores. One statistic is designed to evaluate score comparability of alternate CAT forms for individual scale scores, while the other statistic is designed to evaluate the overall score comparability of alternate CAT forms. The effectiveness of the new statistics is illustrated using data from grade 3 through 8 reading and math CATs. Results suggest that both CATs demonstrated reasonably high levels of score comparability, that score comparability was less at very high or low scores where few students score, and that using random samples with fewer students per grade did not have a big impact on score comparability. Results also suggested that score comparability was sometimes higher when the bottom 20% of scorers were used to calculate overall score comparability compared to all students. Additional discussion related to applying the statistics in different contexts is provided.
本研究引入了两种新的统计方法,用于衡量计算机自适应测试(CAT)的分数可比性,该方法基于比较获得相同量表分数的考生的条件测量标准误差(CSEM)。一种统计方法旨在评估单个量表分数的交替CAT形式的分数可比性,而另一种统计方法旨在评估交替CAT形式的整体分数可比性。使用三年级至八年级阅读和数学CAT的数据说明了新统计方法的有效性。结果表明,两种CAT都表现出相当高的分数可比性水平,在很少有学生得分的非常高或非常低的分数处,分数可比性较低,并且每个年级使用较少学生的随机样本对分数可比性没有太大影响。结果还表明,与所有学生相比,当使用得分最低的20%的学生来计算整体分数可比性时,分数可比性有时会更高。提供了与在不同背景下应用这些统计方法相关的进一步讨论。