Kim Sang Won, Chang Min Cheol
Medical Research Center, College of Medicine, Yeungnam University, Daegu, Republic of Korea.
Department of Rehabilitation Medicine, College of Medicine, Yeungnam University, Daegu, Republic of Korea.
Ann Palliat Med. 2023 Jul;12(4):748-756. doi: 10.21037/apm-23-78. Epub 2023 Jun 19.
Depression is a major public health concern, with an estimated 10.8% of adults experiencing depression. Depression can have a significant impact on an individual's quality of life, social function, and productivity. Early diagnosis of depression is important in preventing its progression. Several tools, such as the Patient Health Questionnaire-9 (PHQ-9) and Beck Depression Inventory, are used to screen patients for depression. We investigated the potential of machine learning in predicting the presence of depression using the results of a national survey.
We collected the data of 5,420 patients from the 2020 Korea National Health and Nutrition Examination. The presence of depression was defined as ≥5 PHQ-9. We categorized output variables into the presence of depression (PHQ-9, ≥5) and absence of depression (PHQ-9, <5). We used 20 variables related to sociodemographic characteristics, health behavior, and presence of chronic disease for the development of three machine learning algorithms [random forest, logistic regression, and deep neural network (DNN)]. Eighty-seven decision trees were used for the random forest model. Linear regression algorithm shows a linear relationship between various input and output variables. For the DNN model, three layers with 16-32-64 neurons, Adam optimizer, and rectified linear unit (ReLU) activation were used. Of the included samples, 70% and 30% were randomly divided into the training and test sets, respectively.
The area under the curve (AUC) of the test dataset for the random forest model was 0.803 [95% confidence interval (CI), 0.776-0.829], 0.812 (95% CI, 0.787-0.837) for the logistic regression model, and 0.805 (95% CI, 0.780-0.831) for the DNN model.
Our study demonstrated the potential of machine learning for the development of models for predicting the presence of depression based of various health-related data. Machine learning models can potentially overcome the limitations of traditional diagnostic methods for depression by incorporating a wide range of objective variables to accurately identify patients with depression, thus avoiding the subjectivity and potential diagnostic errors associated with the subjective interpretation of symptoms observed by a clinician. Further efforts to increase the accuracy of machine learning models by utilizing more variables and data needed to detect depression.
抑郁症是一个重大的公共卫生问题,估计有10.8%的成年人患有抑郁症。抑郁症会对个人的生活质量、社会功能和生产力产生重大影响。抑郁症的早期诊断对于预防其进展很重要。几种工具,如患者健康问卷-9(PHQ-9)和贝克抑郁量表,用于筛查抑郁症患者。我们利用一项全国性调查的结果,研究了机器学习在预测抑郁症存在方面的潜力。
我们收集了2020年韩国国民健康与营养检查中5420名患者的数据。抑郁症的存在被定义为PHQ-9≥5。我们将输出变量分为抑郁症存在(PHQ-9≥5)和抑郁症不存在(PHQ-9<5)。我们使用了20个与社会人口学特征、健康行为和慢性病存在相关的变量来开发三种机器学习算法[随机森林、逻辑回归和深度神经网络(DNN)]。随机森林模型使用了87棵决策树。线性回归算法显示了各种输入和输出变量之间的线性关系。对于DNN模型,使用了具有16 - 32 - 64个神经元的三层结构、Adam优化器和整流线性单元(ReLU)激活函数。在纳入的样本中,70%和30%分别被随机分为训练集和测试集。
随机森林模型测试数据集的曲线下面积(AUC)为0.803[95%置信区间(CI),0.776 - 0.829],逻辑回归模型为0.812(95% CI,0.787 - 0.837),DNN模型为0.805(95% CI,0.780 - 0.831)。
我们的研究证明了机器学习在基于各种健康相关数据开发抑郁症预测模型方面的潜力。机器学习模型有可能通过纳入广泛的客观变量来准确识别抑郁症患者,从而克服传统抑郁症诊断方法的局限性,避免与临床医生对观察到的症状进行主观解释相关的主观性和潜在诊断错误。需要进一步努力通过利用更多变量和数据来提高机器学习模型检测抑郁症的准确性。