IPN - Leibniz Institute for Science and Mathematics Education, Educational Measurement, Olshausenstraße 62, 24118, Kiel, Germany.
Technical University Berlin, Berlin, Germany.
Behav Res Methods. 2023 Apr;55(3):1392-1412. doi: 10.3758/s13428-022-01844-1. Epub 2022 Jun 1.
Early detection of risk of failure on interactive tasks comes with great potential for better understanding how examinees differ in their initial behavior as well as for adaptively tailoring interactive tasks to examinees' competence levels. Drawing on procedures originating in shopper intent prediction on e-commerce platforms, we introduce and showcase a machine learning-based procedure that leverages early-window clickstream data for systematically investigating early predictability of behavioral outcomes on interactive tasks. We derive features related to the occurrence, frequency, sequentiality, and timing of performed actions from early-window clickstreams and use extreme gradient boosting for classification. Multiple measures are suggested to evaluate the quality and utility of early predictions. The procedure is outlined by investigating early predictability of failure on two PIAAC 2012 Problem Solving in Technology Rich Environments (PSTRE) tasks. We investigated early windows of varying size in terms of time and in terms of actions. We achieved good prediction performance at stages where examinees had, on average, at least two thirds of their solution process ahead of them, and the vast majority of examinees who failed could potentially be detected to be at risk before completing the task. In-depth analyses revealed different features to be indicative of success and failure at different stages of the solution process, thereby highlighting the potential of the applied procedure for gaining a finer-grained understanding of the trajectories of behavioral patterns on interactive tasks.
早期检测交互任务中的失败风险具有很大的潜力,可以更好地了解考生在初始行为方面的差异,以及自适应地根据考生的能力水平调整交互任务。借鉴电子商务平台上购物者意向预测的程序,我们引入并展示了一种基于机器学习的程序,该程序利用早期窗口点击流数据来系统地研究交互任务中行为结果的早期可预测性。我们从早期窗口点击流中提取与操作的发生、频率、顺序和时间相关的特征,并使用极端梯度提升进行分类。提出了多种措施来评估早期预测的质量和效用。该程序通过调查两个 PIAAC 2012 年在技术丰富环境中解决问题(PSTRE)任务中的失败早期可预测性来概述。我们根据时间和操作的不同,研究了不同大小的早期窗口。在考生平均至少有三分之二的解题过程在他们前面的阶段,我们取得了良好的预测性能,并且绝大多数可能处于风险中的失败考生都可以在完成任务之前被检测到。深入分析揭示了不同的特征在解题过程的不同阶段对成功和失败的指示作用,从而突出了应用程序在更精细地了解交互任务中行为模式轨迹方面的潜力。