Zhou Todd, Jiao Hong
Winston Churchill High School, Potomac, MD, USA.
University of Maryland, College Park, USA.
Educ Psychol Meas. 2023 Aug;83(4):831-854. doi: 10.1177/00131644221117193. Epub 2022 Aug 13.
Cheating detection in large-scale assessment received considerable attention in the extant literature. However, none of the previous studies in this line of research investigated the stacking ensemble machine learning algorithm for cheating detection. Furthermore, no study addressed the issue of class imbalance using resampling. This study explored the application of the stacking ensemble machine learning algorithm to analyze the item response, response time, and augmented data of test-takers to detect cheating behaviors. The performance of the stacking method was compared with that of two other ensemble methods (bagging and boosting) as well as six base non-ensemble machine learning algorithms. Issues related to class imbalance and input features were addressed. The study results indicated that stacking, resampling, and feature sets including augmented summary data generally performed better than its counterparts in cheating detection. Compared with other competing machine learning algorithms investigated in this study, the meta-model from stacking using discriminant analysis based on the top two base models-Gradient Boosting and Random Forest-generally performed the best when item responses and the augmented summary statistics were used as the input features with an under-sampling ratio of 10:1 among all the study conditions.
大规模评估中的作弊检测在现有文献中受到了相当多的关注。然而,该研究领域之前的研究均未探讨用于作弊检测的堆叠集成机器学习算法。此外,也没有研究使用重采样来解决类别不平衡问题。本研究探索了堆叠集成机器学习算法在分析考生的项目反应、反应时间和增强数据以检测作弊行为方面的应用。将堆叠方法的性能与其他两种集成方法(装袋法和提升法)以及六种基本非集成机器学习算法的性能进行了比较。解决了与类别不平衡和输入特征相关的问题。研究结果表明,在作弊检测中,堆叠、重采样以及包括增强汇总数据在内的特征集通常比其他方法表现更好。与本研究中调查的其他竞争机器学习算法相比,在所有研究条件下,当使用项目反应和增强汇总统计量作为输入特征且欠采样率为10:1时,基于前两个基本模型——梯度提升和随机森林——使用判别分析的堆叠元模型通常表现最佳。