Liu Helai, Mao Mao, Li Xia, Gao Jia
China Conservatory of Music, Beijing, People's Republic of China.
University of Cambridge, Cambridge, United Kingdom.
PLoS One. 2025 Mar 31;20(3):e0317726. doi: 10.1371/journal.pone.0317726. eCollection 2025.
Student dropout is a significant social issue with extensive implications for individuals and society, including reduced employability and economic downturns, which, in turn, drastically influence social sustainable development. Identifying students at high risk of dropping out is a major challenge for sustainable education. While existing machine learning and deep learning models can effectively predict dropout risks, they often rely on real student data, raising ethical concerns and the risk of information leakage. Additionally, the poor interpretability of these models complicates their use in educational management, as it is difficult to justify identifying a student as high-risk based on an opaque model. To address these two issues, we introduced for the first time a modified Preprocessed Kernel Inducing Points data distillation technique (PP-KIPDD), specializing in distilling tabular structured dataset, and innovatively employed the PP-KIPDD to reconstruct new samples that serve as qualified training sets simulating student information distributions, thereby preventing student privacy information leakage, which showed better performance and efficiency compared to traditional data synthesis techniques such as the Conditional Generative Adversarial Networks. Furthermore, we empower the classifiers credibility by enhancing model interpretability utilized SHAP (SHapley Additive exPlanations) values and elucidated the significance of selected features from an educational management perspective. With well-explained features from both quantitative and qualitative aspects, our approach enhances the feasibility and reasonableness of dropout predictions using machine learning techniques. We believe our approach represents a novel end-to-end framework of artificial intelligence application in the field of sustainable education management from the view of decision-makers, as it addresses privacy leakage protection and enhances model credibility for practical management implementations.
学生辍学是一个重大的社会问题,对个人和社会有着广泛的影响,包括就业能力下降和经济衰退,进而对社会可持续发展产生重大影响。识别有高辍学风险的学生是可持续教育面临的一项重大挑战。虽然现有的机器学习和深度学习模型能够有效地预测辍学风险,但它们通常依赖真实的学生数据,这引发了伦理问题以及信息泄露的风险。此外,这些模型的可解释性较差,使得它们在教育管理中的应用变得复杂,因为很难基于一个不透明的模型来证明将一名学生认定为高风险是合理的。为了解决这两个问题,我们首次引入了一种改进的预处理核诱导点数据蒸馏技术(PP-KIPDD),该技术专门用于蒸馏表格结构化数据集,并创新性地使用PP-KIPDD来重构新样本,这些新样本可作为模拟学生信息分布的合格训练集,从而防止学生隐私信息泄露,与条件生成对抗网络等传统数据合成技术相比,它表现出了更好的性能和效率。此外,我们通过利用SHAP(SHapley值加法解释)值增强模型可解释性来赋予分类器可信度,并从教育管理的角度阐明所选特征的重要性。从定量和定性两个方面对特征进行了充分解释,我们的方法提高了使用机器学习技术进行辍学预测的可行性和合理性。我们相信,从决策者的角度来看,我们的方法代表了可持续教育管理领域人工智能应用的一种新颖的端到端框架,因为它解决了隐私泄露保护问题,并增强了模型在实际管理实施中的可信度。