Da Ting
National Engineering Research Center of Cyberlearning and Intelligent Technology, Beijing Normal University, Beijing, China.
Sci Rep. 2025 Apr 4;15(1):11521. doi: 10.1038/s41598-025-89394-2.
A central task in educational studies is to uncover factors that drive a student's academic performance. While existing studies have utilized meticulous regression designs, it is challenging to select appropriate controls. Machine learning, however, offers a solution whereby the entire variable set can be inspected and filtered by different optimization schemes. In that light, this paper adopts a three-stage framework to analyze and discover potentially latent causal relationships from an open dataset from UCI. In the first stage, machine learning methods are employed to select candidate variables that are closely associated with student grades, and then a "post-double-selection" process is implemented to select the set of control variables. In the final stage, three case studies are conducted to illustrate the effectiveness of the three-stage design. The model pipeline is suitable for situations where there is only minimal prior knowledge available to address a potentially causal research question.
教育研究中的一项核心任务是揭示驱动学生学业成绩的因素。虽然现有研究采用了细致的回归设计,但选择合适的控制变量具有挑战性。然而,机器学习提供了一种解决方案,通过不同的优化方案可以检查和筛选整个变量集。有鉴于此,本文采用一个三阶段框架,从加州大学欧文分校(UCI)的一个开放数据集中分析并发现潜在的因果关系。在第一阶段,使用机器学习方法选择与学生成绩密切相关的候选变量,然后实施“双重选择后”过程来选择控制变量集。在最后阶段,进行了三个案例研究来说明三阶段设计的有效性。该模型管道适用于只有极少先验知识来解决潜在因果研究问题的情况。