Ordoñez-Avila Ricardo, Meza Jaime, Ventura Sebastian
Departamento de Sistemas Computacionales, Universidad Técnica de Manabí, Portoviejo, Manabí, Ecuador.
Departamento de Informática y Análisis Numérico, Universidad de Córdoba, Córdoba, Córdoba, Spain.
PeerJ Comput Sci. 2025 May 5;11:e2855. doi: 10.7717/peerj-cs.2855. eCollection 2025.
Higher education institutions actively integrate information and communication technologies through learning management systems (LMS), which are crucial for online education. This study used data mining techniques to predict the autonomous scores of students in the online Law and Psychology programs at the Technical University of Manabi. The process involved data integration and selection of more than 16,000 records, preprocessing, transformation with RobustScaler, predictive modelling that included recursive feature elimination with cross-validation to select features (RFEcv), and hyperparameter fitting to achieve the best fit, and finally, evaluation of the models using metrics of root mean square error (RMSE), mean absolute error (MAE), and the coefficient of determination (R). The feature selection framework suggested by RFEcv contributed to the performance of the models. The variables analyzed focused on download rate, homework submission rate, test performance rate, median daily accesses, median days of access per month, observation of comments on teacher-reviewed assignments, length of final exam, and not requiring the supplemental exam. Hyperparameter adjustment improved the performance of the models after applying RFEcv. The models evaluated showed minimal differences in RMSE ([0.5411 .. 0.6025]). The gradient boosting model achieved the best performance of R = 0.6693, MAE = 0.4041 and RMSE = 0.5411 with the Law online program data, as with the Psychology online program data, with an R = 0.6418, MAE = 0.4232 and RMSE = 0.6025, while the combination of both data sets reflected the best performance with the extreme gradient boosting (XGBoost) model with the values of R = 0.6294, MAE = 0.4295 and RMSE = 0.5985. Future research and implementations could include autonomous score data through plugins and reports integrated into LMSs. This approach may provide indicators of interest for understanding and improving online learning from a personalized, real-time perspective.
高等教育机构通过学习管理系统(LMS)积极整合信息和通信技术,这对在线教育至关重要。本研究使用数据挖掘技术来预测马纳比技术大学在线法律和心理学课程学生的自主成绩。该过程包括数据集成和对超过16000条记录的选择、预处理、使用稳健缩放器(RobustScaler)进行转换、预测建模(包括使用交叉验证的递归特征消除以选择特征(RFEcv))以及超参数拟合以实现最佳拟合,最后,使用均方根误差(RMSE)、平均绝对误差(MAE)和决定系数(R)等指标对模型进行评估。RFEcv建议的特征选择框架有助于模型的性能。分析的变量集中在下载率、作业提交率、测试成绩率、每日访问中位数、每月访问天数中位数、对教师评审作业的评论观察、期末考试时长以及是否需要补考。在应用RFEcv后,超参数调整提高了模型的性能。评估的模型在RMSE([0.5411.. 0.6025])方面显示出最小差异。梯度提升模型在法律在线课程数据中取得了最佳性能,R = 0.6693,MAE = 0.4041,RMSE = 0.5411,在心理学在线课程数据中,R = 0.6418,MAE = 0.4232,RMSE = 0.6025,而两个数据集的组合在极端梯度提升(XGBoost)模型中表现最佳,R = 0.6294,MAE = 0.4295,RMSE = 0.5985。未来的研究和实施可以通过集成到LMS中的插件和报告纳入自主成绩数据。这种方法可能会提供从个性化、实时角度理解和改进在线学习的相关有趣指标。