Department of Statistics and Business Analytics, United Arab Emirates University, Al Ain, UAE.
Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan.
Sci Rep. 2024 Jul 13;14(1):16200. doi: 10.1038/s41598-024-66894-1.
The COVID-19 pandemic has had a significant impact on students' academic performance. The effects of the pandemic have varied among students, but some general trends have emerged. One of the primary challenges for students during the pandemic has been the disruption of their study habits. Students getting used to online learning routines might find it even more challenging to perform well in face to face learning. Therefore, assessing various potential risk factors associated with students low performance and its prediction is important for early intervention. As students' performance data encompass diverse behaviors, standard machine learning methods find it hard to get useful insights for beneficial practical decision making and early interventions. Therefore, this research explores regularized ensemble learning methods for effectively analyzing students' performance data and reaching valid conclusions. To this end, three pruning strategies are implemented for the random forest method. These methods are based on out-of-bag sampling, sub-sampling and sub-bagging. The pruning strategies discard trees that are adversely affected by the unusual patterns in the students data forming forests of accurate and diverse trees. The methods are illustrated on an example data collected from university students currently studying on campus in a face-to-face modality, who studied during the COVID-19 pandemic through online learning. The suggested methods outperform all the other methods considered in this paper for predicting students at the risk of academic failure. Moreover, various factors such as class attendance, students interaction, internet connectivity, pre-requisite course(s) during the restrictions, etc., are identified as the most significant features.
新冠疫情对学生的学业表现产生了重大影响。疫情对学生的影响各不相同,但出现了一些普遍趋势。疫情期间,学生面临的主要挑战之一是学习习惯的中断。习惯于在线学习的学生可能会发现,在面对面学习中表现出色更加具有挑战性。因此,评估与学生低绩效相关的各种潜在风险因素及其预测对于早期干预至关重要。由于学生的表现数据包含多种行为,因此标准的机器学习方法很难为有益的实际决策和早期干预提供有用的见解。因此,本研究探讨了正则化集成学习方法,以有效地分析学生的表现数据并得出有效结论。为此,针对随机森林方法实施了三种剪枝策略。这些方法基于袋外抽样、子抽样和子装袋。剪枝策略丢弃了受学生数据中异常模式影响的树木,从而形成了准确多样的树木森林。该方法应用于从目前正在以面对面模式在校园学习的大学生那里收集的示例数据,这些学生在新冠疫情期间通过在线学习。所提出的方法在预测有学术失败风险的学生方面优于本文中考虑的所有其他方法。此外,还确定了各种因素,如课堂出勤率、学生互动、互联网连接、限制期间的先修课程等,这些因素是最重要的特征。