Jin Weifeng, Chen Shuzi, Wang Mengxia, Lin Ping
Department of Medical Laboratory, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China.
Int J Gen Med. 2025 May 8;18:2461-2473. doi: 10.2147/IJGM.S524016. eCollection 2025.
To develop a clinical risk prediction model for depressive disorders using seven machine learning algorithms based on routine blood test indicators.
A retrospective study was conducted, involving 284 patients with depressive disorders and 214 healthy controls recruited between January and October 2024. Clinical data, including age, sex, and routine blood test results, were collected. The dataset was randomly divided into a training set (70%; n=348) and a test set (30%; n=150). Univariate logistic regression analysis (p<0.1) was initially performed to identify potential predictors, followed by feature selection using the Boruta and LASSO algorithms. Seven machine learning algorithms were employed to construct predictive models, with their performance evaluated using metrics such as AUC, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), precision, recall, and F1 score. A multivariable logistic regression model was subsequently used to develop a nomogram, and its discrimination, calibration, and clinical utility were comprehensively assessed.
Four significant predictors (alkaline phosphatase [AKP], serotonin, phenylalanine [Phe], and arginine [Arg]) were identified through univariate logistic regression combined with Boruta and LASSO feature selection. Among the seven algorithms, the random forest model exhibited the highest AUC, achieving an AUC of 1.000 (95% CI: 1.000-1.000) in the training set and 0.958 (95% CI: 0.931-0.985) in the test set. However, due to concerns about potential overfitting, the multivariable logistic regression model was selected as the final predictive model. A nomogram was constructed based on this model.
This study successfully developed a clinically interpretable risk prediction model for depressive disorders by integrating machine learning algorithms and routine blood test indicators. The logistic regression model demonstrated robust performance across all metrics and holds potential as a reliable auxiliary tool for the diagnosis of depressive disorders.
基于常规血液检测指标,使用七种机器学习算法开发一种抑郁症临床风险预测模型。
进行一项回顾性研究,纳入2024年1月至10月招募的284例抑郁症患者和214例健康对照。收集临床数据,包括年龄、性别和常规血液检测结果。将数据集随机分为训练集(70%;n = 348)和测试集(30%;n = 150)。首先进行单因素逻辑回归分析(p<0.1)以识别潜在预测因素,随后使用Boruta和LASSO算法进行特征选择。采用七种机器学习算法构建预测模型,使用AUC、敏感性、特异性、阳性预测值(PPV)、阴性预测值(NPV)、精确率、召回率和F1分数等指标评估其性能。随后使用多因素逻辑回归模型开发列线图,并对其区分度、校准度和临床实用性进行综合评估。
通过单因素逻辑回归结合Boruta和LASSO特征选择,确定了四个显著预测因素(碱性磷酸酶[AKP]、血清素、苯丙氨酸[Phe]和精氨酸[Arg])。在七种算法中,随机森林模型的AUC最高,训练集中的AUC为1.000(95%CI:1.000 - 1.000),测试集中为0.958(95%CI:0.931 - 0.985)。然而,由于担心潜在的过拟合,选择多因素逻辑回归模型作为最终预测模型。基于该模型构建了列线图。
本研究通过整合机器学习算法和常规血液检测指标,成功开发了一种可临床解释的抑郁症风险预测模型。逻辑回归模型在所有指标上均表现出稳健性能,有望成为抑郁症诊断的可靠辅助工具。