Hosseinzadeh Kasani Payam, Lee Jung Eun, Park Chihyun, Yun Cheol-Heui, Jang Jae-Won, Lee Sang-Ah
Department of Neurology, Kangwon National University Hospital, Chuncheon, Republic of Korea.
Interdisciplinary Graduate Program in Medical Bigdata Convergence, Kangwon National University, Chuncheon, Republic of Korea.
Front Nutr. 2023 May 9;10:1165854. doi: 10.3389/fnut.2023.1165854. eCollection 2023.
Depression is a prevalent disorder worldwide, with potentially severe implications. It contributes significantly to an increased risk of diseases associated with multiple risk factors. Early accurate diagnosis of depressive symptoms is a critical first step toward management, intervention, and prevention. Various nutritional and dietary compounds have been suggested to be involved in the onset, maintenance, and severity of depressive disorders. Despite the challenges to better understanding the association between nutritional risk factors and the occurrence of depression, assessing the interplay of these markers through supervised machine learning remains to be fully explored.
This study aimed to determine the ability of machine learning-based decision support methods to identify the presence of depression using publicly available health data from the Korean National Health and Nutrition Examination Survey. Two exploration techniques, namely, uniform manifold approximation and projection and Pearson correlation, were performed for explanatory analysis among datasets. A grid search optimization with cross-validation was performed to fine-tune the models for classifying depression with the highest accuracy. Several performance measures, including accuracy, precision, recall, F1 score, confusion matrix, areas under the precision-recall and receiver operating characteristic curves, and calibration plot, were used to compare classifier performances. We further investigated the importance of the features provided: visualized interpretation using ELI5, partial dependence plots, and local interpretable using model-agnostic explanations and Shapley additive explanation for the prediction at both the population and individual levels.
The best model achieved an accuracy of 86.18% for XGBoost and an area under the curve of 84.96% for the random forest model in original dataset and the XGBoost algorithm with an accuracy of 86.02% and an area under the curve of 85.34% in the quantile-based dataset. The explainable results revealed a complementary observation of the relative changes in feature values, and, thus, the importance of emergent depression risks could be identified.
The strength of our approach is the large sample size used for training with a fine-tuned model. The machine learning-based analysis showed that the hyper-tuned model has empirically higher accuracy in classifying patients with depressive disorder, as evidenced by the set of interpretable experiments, and can be an effective solution for disease control.
抑郁症是一种在全球范围内普遍存在的疾病,具有潜在的严重影响。它显著增加了与多种风险因素相关疾病的发病风险。早期准确诊断抑郁症状是管理、干预和预防的关键第一步。各种营养和膳食化合物被认为与抑郁症的发生、维持和严重程度有关。尽管在更好地理解营养风险因素与抑郁症发生之间的关联方面存在挑战,但通过监督式机器学习评估这些标志物之间的相互作用仍有待充分探索。
本研究旨在利用韩国国家健康与营养检查调查的公开可用健康数据,确定基于机器学习的决策支持方法识别抑郁症的能力。对数据集进行了两种探索技术,即均匀流形逼近与投影和皮尔逊相关性分析,以进行解释性分析。进行了带有交叉验证的网格搜索优化,以微调模型,使其以最高准确率对抑郁症进行分类。使用了几种性能指标,包括准确率、精确率、召回率、F1分数、混淆矩阵、精确率-召回率曲线下面积和受试者工作特征曲线下面积以及校准图,来比较分类器性能。我们进一步研究了所提供特征的重要性:使用ELI5进行可视化解释、部分依赖图,以及在总体和个体层面上使用模型无关解释和夏普利加性解释进行局部可解释性分析以进行预测。
在原始数据集中,XGBoost的最佳模型准确率达到86.18%,随机森林模型的曲线下面积为84.96%;在基于分位数的数据集里,XGBoost算法的准确率为86.02%,曲线下面积为85.34%。可解释的结果揭示了对特征值相对变化的互补观察,因此可以确定新发抑郁风险的重要性。
我们方法的优势在于使用了经过微调的模型进行训练的大样本量。基于机器学习的分析表明,经过超参数调整的模型在对抑郁症患者进行分类方面具有更高的经验准确性,一系列可解释实验证明了这一点,并且可以成为疾病控制的有效解决方案。