Dong Xing-Xuan, Liu Jian-Hua, Zhang Tian-Yang, Pan Chen-Wei, Zhao Chun-Hua, Wu Yi-Bo, Chen Dan-Dan
School of Public Health, Suzhou Medical College of Soochow University, Suzhou, China.
Research Center for Psychology and Behavioral Sciences, Soochow University, Suzhou, China.
Psychiatry Investig. 2025 Mar;22(3):267-278. doi: 10.30773/pi.2024.0156. Epub 2025 Mar 18.
Machine learning (ML) has been reported to have better predictive capability than traditional statistical techniques. The aim of this study was to assess the efficacy of ML algorithms and logistic regression (LR) for predicting depressive symptoms during the COVID-19 pandemic.
Analyses were carried out in a national cross-sectional study involving 21,916 participants. The ML algorithms in this study included random forest (RF), support vector machine (SVM), neural network (NN), and gradient boosting machine (GBM) methods. The performance indices were sensitivity, specificity, accuracy, precision, F1-score, and area under the receiver operating characteristic curve (AUC).
LR and NN had the best performance in terms of AUCs. The risk of overfitting was found to be negligible for most ML models except for RF, and GBM obtained the highest sensitivity, specificity, accuracy, precision, and F1-score. Therefore, LR, NN, and GBM models ranked among the best models.
Compared with ML models, LR model performed comparably to ML models in predicting depressive symptoms and identifying potential risk factors while also exhibiting a lower risk of overfitting.
据报道,机器学习(ML)比传统统计技术具有更好的预测能力。本研究的目的是评估ML算法和逻辑回归(LR)在预测2019冠状病毒病大流行期间抑郁症状方面的有效性。
在一项涉及21916名参与者的全国性横断面研究中进行分析。本研究中的ML算法包括随机森林(RF)、支持向量机(SVM)、神经网络(NN)和梯度提升机(GBM)方法。性能指标包括敏感性、特异性、准确性、精确性、F1分数和受试者工作特征曲线下面积(AUC)。
就AUC而言,LR和NN表现最佳。除RF外,大多数ML模型的过拟合风险可忽略不计,GBM获得了最高的敏感性、特异性、准确性、精确性和F1分数。因此,LR、NN和GBM模型位列最佳模型之中。
与ML模型相比,LR模型在预测抑郁症状和识别潜在风险因素方面与ML模型表现相当,同时过拟合风险更低。