使用多模态数据评估用于高等教育公平且可解释预测的集成模型。

Evaluating ensemble models for fair and interpretable prediction in higher education using multimodal data.

作者信息

Arévalo-Cordovilla Felipe Emiliano, Peña Marta

机构信息

Faculty of Science and Engineering, Universidad Estatal de Milagro, Ciudadela Universitaria "Dr. Rómulo Minchala Murillo", km. 1.5 vía Milagro - Virgen de Fátima, Milagro, 091050, Ecuador.

Department of Mathematics and IOC Research Institute, Universitat Politècnica de Catalunya-BarcelonaTech, Diagonal 647, Barcelona, 08028, Spain.

出版信息

Sci Rep. 2025 Aug 11;15(1):29420. doi: 10.1038/s41598-025-15388-9.

DOI:10.1038/s41598-025-15388-9

PMID:40789907

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12339690/

Abstract

Early prediction of academic performance is vital for reducing attrition in online higher education. However, existing models often lack comprehensive data integration and comparison with state-of-the-art techniques. This study, which involved 2,225 engineering students at a public university in Ecuador, addressed these gaps. The objective was to develop a robust predictive framework by integrating Moodle interactions, academic history, and demographic data using SMOTE for class balancing. The methodology involved a comparative evaluation of seven base learners, including traditional algorithms, Random Forest, and gradient boosting ensembles (XGBoost, LightGBM), and a final stacking model, all validated using a 5-fold stratified cross-validation. While the LightGBM model emerged as the best-performing base model (Area Under the Curve (AUC) = 0.953, F1 = 0.950), the stacking ensemble (AUC = 0.835) did not offer a significant performance improvement and showed considerable instability. SHAP analysis confirmed that early grades were the most influential predictors across top models. The final model demonstrated strong fairness across gender, ethnicity, and socioeconomic status (consistency = 0.907). These findings enable institutions to identify at-risk students using state-of-the-art interpretable and fair models. These findings enable institutions to identify at-risk students using state-of-the-art, interpretable, and fair models, advancing learning analytics by validating key success predictors against contemporary benchmarks.

摘要

早期预测学业成绩对于减少在线高等教育中的退学率至关重要。然而，现有模型往往缺乏全面的数据整合以及与最先进技术的比较。这项涉及厄瓜多尔一所公立大学2225名工科学生的研究弥补了这些差距。其目标是通过使用SMOTE进行类别平衡来整合Moodle交互数据、学术历史数据和人口统计数据，从而开发一个强大的预测框架。该方法包括对七个基础学习器进行比较评估，其中包括传统算法、随机森林以及梯度提升集成学习器（XGBoost、LightGBM），并最终构建一个堆叠模型，所有这些都使用5折分层交叉验证进行验证。虽然LightGBM模型成为表现最佳的基础模型（曲线下面积（AUC）=0.953，F1=0.950），但堆叠集成模型（AUC=0.835）并未带来显著的性能提升，且表现出相当大的不稳定性。SHAP分析证实，早期成绩是所有顶级模型中最具影响力的预测因素。最终模型在性别、种族和社会经济地位方面表现出很强的公平性（一致性=0.907）。这些发现使各机构能够使用最先进的可解释且公平的模型识别有风险的学生。这些发现使各机构能够使用最先进、可解释且公平的模型识别有风险的学生，通过对照当代基准验证关键成功预测因素来推动学习分析的发展。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用多模态数据评估用于高等教育公平且可解释预测的集成模型。

Evaluating ensemble models for fair and interpretable prediction in higher education using multimodal data.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

使用多模态数据评估用于高等教育公平且可解释预测的集成模型。

Evaluating ensemble models for fair and interpretable prediction in higher education using multimodal data.

作者信息

机构信息

出版信息

相似文献

本文引用的文献