Wang Chun, Zheng Yi, Chang Hua-Hua
University of Minnesota at Twin-Cities, 75 East River Road, Elliott Hall N658, Minneapolis, MN, 55403, USA,
Psychometrika. 2014 Jan;79(1):154-74. doi: 10.1007/s11336-013-9356-y. Epub 2013 Dec 10.
With the advent of web-based technology, online testing is becoming a mainstream mode in large-scale educational assessments. Most online tests are administered continuously in a testing window, which may post test security problems because examinees who take the test earlier may share information with those who take the test later. Researchers have proposed various statistical indices to assess the test security, and one most often used index is the average test-overlap rate, which was further generalized to the item pooling index (Chang & Zhang, 2002, 2003). These indices, however, are all defined as the means (that is, the expected proportion of common items among examinees) and they were originally proposed for computerized adaptive testing (CAT). Recently, multistage testing (MST) has become a popular alternative to CAT. The unique features of MST make it important to report not only the mean, but also the standard deviation (SD) of test overlap rate, as we advocate in this paper. The standard deviation of test overlap rate adds important information to the test security profile, because for the same mean, a large SD reflects that certain groups of examinees share more common items than other groups. In this study, we analytically derived the lower bounds of the SD under MST, with the results under CAT as a benchmark. It is shown that when the mean overlap rate is the same between MST and CAT, the SD of test overlap tends to be larger in MST. A simulation study was conducted to provide empirical evidence. We also compared the security of MST under the single-pool versus the multiple-pool designs; both analytical and simulation studies show that the non-overlapping multiple-pool design will slightly increase the security risk.
随着基于网络技术的出现,在线测试正成为大规模教育评估中的一种主流模式。大多数在线测试在一个测试窗口内连续进行,这可能会带来测试安全问题,因为较早参加测试的考生可能会与较晚参加测试的考生分享信息。研究人员提出了各种统计指标来评估测试安全性,其中最常用的指标之一是平均测试重叠率,该指标后来进一步推广为项目池指标(Chang & Zhang,2002年,2003年)。然而,这些指标都被定义为均值(即考生之间共同项目的预期比例),并且它们最初是为计算机自适应测试(CAT)提出的。最近,多阶段测试(MST)已成为CAT的一种流行替代方案。正如我们在本文中所主张的,MST的独特特征使得不仅报告测试重叠率的均值,而且报告其标准差(SD)变得很重要。测试重叠率的标准差为测试安全概况增添了重要信息,因为对于相同的均值,较大的标准差反映出某些考生群体比其他群体共享更多的共同项目。在本研究中,我们以CAT的结果为基准,通过分析得出了MST下标准差的下限。结果表明,当MST和CAT的平均重叠率相同时,MST中测试重叠的标准差往往更大。我们进行了一项模拟研究以提供实证证据。我们还比较了单池设计与多池设计下MST的安全性;分析和模拟研究均表明,不重叠的多池设计会略微增加安全风险。