Chen Wei, Gao Yujing, Xiao Shiyin
School of Psychology, Guizhou Normal University, Guiyang, China.
Inner Mongolia Student Bullying Prevention Research Center, Tongliao, China.
Heliyon. 2024 Sep 14;10(18):e37723. doi: 10.1016/j.heliyon.2024.e37723. eCollection 2024 Sep 30.
High non-suicidal self-injury (NSSI) prevalence among adolescents is a global health issue. However, current prediction models for adolescent NSSI rely on a limited set of algorithms, resulting in biased predictions. Therefore, the aim of this study is to develop multiple machine learning models to enhance prediction accuracy and mitigate biases among Chinese adolescents.
A total of 4487 junior and senior high school students in China were recruited. Multiple algorithms were included, such as logistic regression, decision tree, support vector machine, Naive Bayes, multi-layer perceptron, K-nearest neighbors, and ensemble learning algorithm like random forest, bagging, AdaBoost, and stacking to build predictive models. Data processing techniques, including standardization and the synthetic minority oversampling technique, were employed to optimize the predictive model. The model was trained on 70 % of the data, reserving 30 % for testing.
The ten prediction models achieved a good performance, with area under the receiver operating characteristic curve (AUC) scores above 0.700 in the test set. The stacking and random forest models achieved AUC scores of 0.904 and 0.898, respectively. The prediction performance of the Naive Bayes model was relatively poor. The top five important variables were resilience, bully, suicidal ideation, internet addiction, and depression.
The ensemble machine learning algorithm showed promising results predicting NSSI among adolescents. Such algorithms should be recommended for future NSSI research to enhance predictive accuracy. Identification of important features in NSSI prediction can help develop screening protocols and lay a foundation for clinical diagnosis and intervention in adolescent populations.
青少年中高非自杀性自伤(NSSI)发生率是一个全球性健康问题。然而,目前针对青少年NSSI的预测模型依赖于有限的一组算法,导致预测存在偏差。因此,本研究的目的是开发多种机器学习模型,以提高中国青少年中预测的准确性并减轻偏差。
共招募了中国4487名初中生和高中生。纳入了多种算法,如逻辑回归、决策树、支持向量机、朴素贝叶斯、多层感知器、K近邻,以及随机森林、装袋法、自适应增强法和堆叠法等集成学习算法来构建预测模型。采用包括标准化和合成少数过采样技术在内的数据处理技术来优化预测模型。该模型在70%的数据上进行训练,保留30%用于测试。
这十个预测模型表现良好,测试集中受试者工作特征曲线(AUC)得分均高于0.700。堆叠法和随机森林模型的AUC得分分别为0.904和0.898。朴素贝叶斯模型的预测性能相对较差。最重要的五个变量是心理韧性、欺凌、自杀意念、网络成瘾和抑郁。
集成机器学习算法在预测青少年NSSI方面显示出有前景的结果。此类算法应推荐用于未来的NSSI研究,以提高预测准确性。识别NSSI预测中的重要特征有助于制定筛查方案,并为青少年人群的临床诊断和干预奠定基础。