German Institute for Economic Research, Berlin, Germany.
Leibniz Institute for Educational Trajectories, Bamberg, Germany.
Behav Res Methods. 2021 Jun;53(3):1202-1217. doi: 10.3758/s13428-020-01480-7. Epub 2020 Oct 1.
Educational large-scale studies typically adopt highly standardized settings to collect cognitive data on large samples of respondents. Increasing costs alongside dwindling response rates in these studies necessitate exploring alternative assessment strategies such as unsupervised web-based testing. Before respective assessment modes can be implemented on a broad scale, their impact on cognitive measurements needs to be quantified. Therefore, an experimental study on N = 17,473 university students from the German National Educational Panel Study has been conducted. Respondents were randomly assigned to a supervised paper-based, a supervised computerized, and an unsupervised web-based mode to work on a test of scientific literacy. Mode-specific effects on selection bias, measurement bias, and predictive bias were examined. The results showed a higher response rate in web-based testing as compared to the supervised modes, without introducing a pronounced mode-specific selection bias. Analyses of differential test functioning showed systematically larger test scores in paper-based testing, particularly among low to medium ability respondents. Prediction bias for web-based testing was observed for one out of four criteria on study-related success factors. Overall, the results indicate that unsupervised web-based testing is not strictly equivalent to other assessment modes. However, the respective bias introduced by web-based testing was generally small. Thus, unsupervised web-based assessments seem to be a feasible option in cognitive large-scale studies in higher education.
教育大规模研究通常采用高度标准化的设置来收集大量受访者的认知数据。这些研究的成本不断增加,而回复率却在下降,因此需要探索替代评估策略,如无人监督的网络测试。在广泛实施各自的评估模式之前,需要量化它们对认知测量的影响。因此,对来自德国国家教育面板研究的 17473 名大学生进行了一项实验研究。受访者被随机分配到监督纸笔测试、监督计算机测试和无人监督的网络测试模式,以进行科学素养测试。检查了模式特定的选择偏差、测量偏差和预测偏差的影响。结果表明,与监督模式相比,网络测试的回复率更高,而不会引入明显的模式特定选择偏差。对差异测试功能的分析表明,在纸笔测试中,系统地获得了更高的测试分数,尤其是在低到中等能力的受访者中。在与学习相关的成功因素的四个标准中,有一个标准观察到了网络测试的预测偏差。总体而言,结果表明,无人监督的网络测试与其他评估模式并不完全等同。然而,网络测试引入的各自偏差通常较小。因此,无人监督的网络评估在高等教育中的认知大规模研究中似乎是一种可行的选择。