Yu Chengfu, Kong Xiangxuan, Yu Weijie, Ni Xingcan, Chen Jing, Liao Xiaoyan
Department of Psychology/Research Center of Adolescent Psychology and Behavior, School of Education, Guangzhou University, Guangzhou, Guangdong, China.
School of Psychology, South China Normal University, Guangzhou, Guangdong, China.
Front Psychiatry. 2025 Aug 5;16:1648585. doi: 10.3389/fpsyt.2025.1648585. eCollection 2025.
Depression is highly prevalent among college students, and accurately identifying risk factors is essential for timely intervention. Given the limitations of traditional linear models in managing high-dimensional data, this study employed machine learning techniques to predict depressive symptoms.
Data were collected from 1,635 Chinese college students and included 38 sociodemographic, psychological, and social variables. Four machine- learning algorithms, Random Forest, XGBoost, LightGBM, and Support Vector Machine, were evaluated.
Results showed that the Random Forest model achieved the highest discriminant performance with an AUC of 0.87 and an accuracy of 0.79, and identified key predictors such as sleep disturbance, perceived stress, experiential avoidance, and self-criticism. SHapley Additive exPlanations analysis further revealed that deteriorating sleep quality and heightened stress levels significantly increased the risk of depressive symptoms.
These findings validate the effectiveness of Random Forest in capturing complex data interactions and offer actionable insights for targeted mental health interventions. Future studies should improve generalizability by incorporating more diverse samples and physiological biomarkers.
抑郁症在大学生中非常普遍,准确识别风险因素对于及时干预至关重要。鉴于传统线性模型在处理高维数据方面的局限性,本研究采用机器学习技术来预测抑郁症状。
收集了1635名中国大学生的数据,包括38个社会人口统计学、心理和社会变量。对随机森林、XGBoost、LightGBM和支持向量机四种机器学习算法进行了评估。
结果表明,随机森林模型具有最高的判别性能,AUC为0.87,准确率为0.79,并识别出睡眠障碍、感知压力、经验性回避和自我批评等关键预测因素。SHapley附加解释分析进一步表明,睡眠质量下降和压力水平升高显著增加了抑郁症状的风险。
这些发现验证了随机森林在捕捉复杂数据交互方面的有效性,并为有针对性的心理健康干预提供了可行的见解。未来的研究应通过纳入更多样化的样本和生理生物标志物来提高可推广性。