Fu Haojie, Zhang Mengmeng, Yang Shuran, Kang Chuanyuan, Liu Liang, Zhao Xudong
Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University, Siping Road, Shanghai, 200092, Shanghai, China.
Shanghai Institute of Intelligent Science and Technology, Tongji University, Siping Road, Shanghai, 200092, Shanghai, China.
BMC Public Health. 2025 Sep 1;25(1):2994. doi: 10.1186/s12889-025-24354-z.
Non-suicidal self-injury is a common risk behavior in adolescence but is often difficult to detect. This study employs interpretable machine learning techniques to develop a classification model for adolescent non-suicidal self-injury and elucidate pertinent factors. Employing diverse algorithms, a comprehensive analysis is conducted to discern critical risk and protective elements within a large dataset, evaluating their alignment with the Integrated Theoretical Model.
In partnership with educational institutions in eastern China, this research compiled data on behaviors and correlated factors through the administration of questionnaires, incorporating demographic information and seven validated scales. Analytical models were built using six machine learning techniques: K-Nearest Neighbors, Support Vector Machine, Logistic Regression, Light Gradient Boosting Machine, CatBoost, and eXtreme Gradient Boosting.
The analysis included a total of 2989 valid responses samples. Among the algorithms, CatBoost demonstrated superior performance, evidenced by an AUPRC of 0.736 and an AUC of 0.863. SHAP visualization highlighted 23 important items. Exploratory factor analysis identified seven factors, designated as Situational Anxiety, Depressive Symptoms, Positive Daily Functioning, Negative Self Esteem, Self-Appraisal of Behavior, Bullying and Reactive Aggression, and Interpersonal Problems and Self-Acceptance.
Leveraging multiple machine learning algorithms for a holistic item analysis, this research identifies critical risk and protective factors for non-suicidal self-injury, thus refining the Integrated Theoretical Model.
非自杀性自伤是青少年中常见的风险行为,但往往难以察觉。本研究采用可解释的机器学习技术来开发青少年非自杀性自伤的分类模型,并阐明相关因素。运用多种算法,在一个大型数据集中进行全面分析,以识别关键的风险和保护因素,并评估它们与综合理论模型的契合度。
本研究与中国东部的教育机构合作,通过问卷调查收集行为及相关因素的数据,纳入人口统计学信息和七个经过验证的量表。使用六种机器学习技术构建分析模型:K近邻算法、支持向量机、逻辑回归、轻量级梯度提升机、CatBoost和极端梯度提升。
分析共纳入2989个有效应答样本。在这些算法中,CatBoost表现出卓越性能,其精确召回率曲线下面积(AUPRC)为0.736,曲线下面积(AUC)为0.863。SHAP可视化突出显示了23个重要项目。探索性因素分析确定了七个因素,分别为情境性焦虑、抑郁症状、日常积极功能、消极自尊、行为自我评估、欺凌与反应性攻击以及人际问题与自我接纳。
本研究利用多种机器学习算法进行全面的项目分析,识别出非自杀性自伤的关键风险和保护因素,从而完善了综合理论模型。