使用XGBOOST机器学习模型和大型生物标志物荷兰数据集（n = 11,081）改善抑郁症的诊断

Improving Diagnosis of Depression With XGBOOST Machine Learning Model and a Large Biomarkers Dutch Dataset ( = 11,081).

作者信息

Sharma Amita, Verbeke Willem J M I

机构信息

Department of Operations Research & Quantitative Analysis, Institute of Agri-Business Management, Swami Keshwanand Rajasthan Agricultural University, Bikaner, India.

Erasmus University, Rotterdam, Netherlands.

出版信息

Front Big Data. 2020 Apr 30;3:15. doi: 10.3389/fdata.2020.00015. eCollection 2020.

DOI:10.3389/fdata.2020.00015

PMID:33693389

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7931945/

Abstract

Machine Learning has been on the rise and healthcare is no exception to that. In healthcare, mental health is gaining more and more space. The diagnosis of mental disorders is based upon standardized patient interviews with defined set of questions and scales which is a time consuming and costly process. Our objective was to apply the machine learning model and to evaluate to see if there is predictive power of biomarkers data to enhance the diagnosis of depression cases. In this research paper, we aimed to explore the detection of depression cases among the sample of 11,081 Dutch citizen dataset. Most of the earlier studies have balanced datasets wherein the proportion of healthy cases and unhealthy cases are equal but in our study, the dataset contains only 570 cases of self-reported depression out of 11,081 cases hence it is a class imbalance classification problem. The machine learning model built on imbalance dataset gives predictions biased toward majority class hence the model will always predict the case as no depression case even if it is a case of depression. We used different resampling strategies to address the class imbalance problem. We created multiple samples by under sampling, over sampling, over-under sampling and ROSE sampling techniques to balance the dataset and then, we applied machine learning algorithm "Extreme Gradient Boosting" (XGBoost) on each sample to classify the mental illness cases from healthy cases. The balanced accuracy, precision, recall and F1 score obtained from over-sampling and over-under sampling were more than 0.90.

摘要

机器学习一直在兴起，医疗保健领域也不例外。在医疗保健中，心理健康正占据越来越多的空间。精神障碍的诊断基于对患者进行标准化访谈，使用一系列特定的问题和量表，这是一个耗时且成本高昂的过程。我们的目标是应用机器学习模型，并评估生物标志物数据是否具有预测能力，以加强对抑郁症病例的诊断。在这篇研究论文中，我们旨在探索在11081名荷兰公民数据集样本中检测抑郁症病例。大多数早期研究的数据集是平衡的，其中健康病例和不健康病例的比例相等，但在我们的研究中，在11081个病例中，数据集仅包含570例自我报告的抑郁症病例，因此这是一个类别不平衡分类问题。基于不平衡数据集构建的机器学习模型会给出偏向多数类别的预测，因此即使是抑郁症病例，该模型也总是会将其预测为非抑郁症病例。我们使用了不同的重采样策略来解决类别不平衡问题。我们通过欠采样、过采样、过欠采样和ROSE采样技术创建了多个样本，以平衡数据集，然后，我们在每个样本上应用机器学习算法“极端梯度提升”（XGBoost），以将精神疾病病例与健康病例进行分类。通过过采样和过欠采样获得的平衡准确率、精确率、召回率和F1分数均超过0.90。

相似文献

Improving Diagnosis of Depression With XGBOOST Machine Learning Model and a Large Biomarkers Dutch Dataset ( = 11,081).使用XGBOOST机器学习模型和大型生物标志物荷兰数据集（n = 11,081）改善抑郁症的诊断

Front Big Data. 2020 Apr 30;3:15. doi: 10.3389/fdata.2020.00015. eCollection 2020.

Diabetes mellitus risk prediction in the presence of class imbalance using flexible machine learning methods.基于灵活机器学习方法的类别不平衡环境下的糖尿病风险预测。

BMC Med Inform Decis Mak. 2022 Feb 10;22(1):36. doi: 10.1186/s12911-022-01775-z.

Identification of Orphan Genes in Unbalanced Datasets Based on Ensemble Learning.基于集成学习的不平衡数据集中孤儿基因的识别

Front Genet. 2020 Oct 2;11:820. doi: 10.3389/fgene.2020.00820. eCollection 2020.

Social Reminiscence in Older Adults' Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning.老年人日常对话中的社会怀旧：使用自然语言处理和机器学习的自动检测。

J Med Internet Res. 2020 Sep 15;22(9):e19133. doi: 10.2196/19133.

Learning from Imbalanced Data: Integration of Advanced Resampling Techniques and Machine Learning Models for Enhanced Cancer Diagnosis and Prognosis.从不平衡数据中学习：先进重采样技术与机器学习模型的整合用于增强癌症诊断与预后

Cancers (Basel). 2024 Oct 8;16(19):3417. doi: 10.3390/cancers16193417.

An intelligent warning model for early prediction of cardiac arrest in sepsis patients.脓毒症患者心脏骤停早期预测的智能预警模型。

Comput Methods Programs Biomed. 2019 Sep;178:47-58. doi: 10.1016/j.cmpb.2019.06.010. Epub 2019 Jun 11.

Classification Prediction of Breast Cancer Based on Machine Learning.基于机器学习的乳腺癌分类预测。

Comput Intell Neurosci. 2023 Jan 11;2023:6530719. doi: 10.1155/2023/6530719. eCollection 2023.

A Proposed Framework for Early Prediction of Schistosomiasis.血吸虫病早期预测的一个提议框架。

Diagnostics (Basel). 2022 Dec 12;12(12):3138. doi: 10.3390/diagnostics12123138.

Exploiting Machine Learning Algorithms and Methods for the Prediction of Agitated Delirium After Cardiac Surgery: Models Development and Validation Study.利用机器学习算法和方法预测心脏手术后的激越性谵妄：模型开发与验证研究

JMIR Med Inform. 2019 Oct 23;7(4):e14993. doi: 10.2196/14993.

MRI Radiomic Features to Predict IDH1 Mutation Status in Gliomas: A Machine Learning Approach using Gradient Tree Boosting.MRI 放射组学特征预测脑胶质瘤 IDH1 突变状态：基于梯度提升决策树的机器学习方法。

Int J Mol Sci. 2020 Oct 27;21(21):8004. doi: 10.3390/ijms21218004.

引用本文的文献

A novel machine learning architecture to improve classification of intermediate cases in health: workflow and case study for public health.一种用于改善健康领域中间病例分类的新型机器学习架构：公共卫生工作流程及案例研究

BMC Bioinformatics. 2025 Jul 16;26(1):180. doi: 10.1186/s12859-025-06228-8.

Deep learning with ensemble-based hybrid AI model for bipolar and unipolar depression detection using demographic and behavioral based on time-series data.基于时间序列数据，利用基于人口统计学和行为学的集成混合人工智能模型进行双相和单相抑郁症检测的深度学习。

Dialogues Clin Neurosci. 2025 Dec;27(1):16-35. doi: 10.1080/19585969.2025.2524337. Epub 2025 Jun 30.

Explainable Machine Learning in the Prediction of Depression.可解释机器学习在抑郁症预测中的应用

Diagnostics (Basel). 2025 Jun 2;15(11):1412. doi: 10.3390/diagnostics15111412.

Artificial Intelligence in Temporomandibular Joint Disorders: An Umbrella Review.人工智能在颞下颌关节紊乱病中的应用：一项系统性综述

Clin Exp Dent Res. 2025 Feb;11(1):e70115. doi: 10.1002/cre2.70115.

Explainable Artificial Intelligence Models for Predicting Depression Based on Polysomnographic Phenotypes.基于多导睡眠图表型预测抑郁症的可解释人工智能模型

Bioengineering (Basel). 2025 Feb 15;12(2):186. doi: 10.3390/bioengineering12020186.

Physiological, Psychological, and Functional Health Determinants of Depressive Symptoms Among the Elderly in India: Evaluation of Classification Performance of XGBoost Models.印度老年人抑郁症状的生理、心理和功能健康决定因素：XGBoost模型分类性能评估

Indian J Psychol Med. 2025 Jan 25:02537176241311196. doi: 10.1177/02537176241311196.

Tryptophan metabolism-related gene CYP1B1 serves as a shared biomarker for both Parkinson's disease and insomnia.色氨酸代谢相关基因CYP1B1是帕金森病和失眠症的共同生物标志物。

Sci Rep. 2025 Jan 8;15(1):1362. doi: 10.1038/s41598-024-84362-8.

The impact of acupuncture on the brain function of patients with mild to moderate major depressive disorder: a randomized controlled trial protocol.针灸对轻中度重性抑郁障碍患者大脑功能的影响：一项随机对照试验方案。

BMC Complement Med Ther. 2024 Nov 8;24(1):388. doi: 10.1186/s12906-024-04690-0.

Detection of Schizophrenia from EEG Signals using Selected Statistical Moments of MFC Coefficients and Ensemble Learning.基于 MFC 系数选择统计矩和集成学习的脑电信号精神分裂症检测

Neuroinformatics. 2024 Oct;22(4):499-520. doi: 10.1007/s12021-024-09684-4. Epub 2024 Sep 19.

Machine Learning, Deep Learning, and Data Preprocessing Techniques for Detecting, Predicting, and Monitoring Stress and Stress-Related Mental Disorders: Scoping Review.机器学习、深度学习和数据预处理技术在检测、预测和监测压力及压力相关精神障碍中的应用：范围综述。

JMIR Ment Health. 2024 Aug 21;11:e53714. doi: 10.2196/53714.

本文引用的文献

Detecting depression using a framework combining deep multimodal neural networks with a purpose-built automated evaluation.使用结合深度多模态神经网络和专门构建的自动化评估的框架来检测抑郁症。

Psychol Assess. 2019 Aug;31(8):1019-1027. doi: 10.1037/pas0000724. Epub 2019 May 2.

Machine learning and big data in psychiatry: toward clinical applications.机器学习和精神病学中的大数据：迈向临床应用。

Curr Opin Neurobiol. 2019 Apr;55:152-159. doi: 10.1016/j.conb.2019.02.006. Epub 2019 Apr 15.

Machine learning in mental health: a scoping review of methods and applications.机器学习在精神健康领域的应用：方法和应用的范围综述。

Psychol Med. 2019 Jul;49(9):1426-1448. doi: 10.1017/S0033291719000151. Epub 2019 Feb 12.

Support Vector Machine Classification of Obsessive-Compulsive Disorder Based on Whole-Brain Volumetry and Diffusion Tensor Imaging.基于全脑容积测量和扩散张量成像的强迫症支持向量机分类

Front Psychiatry. 2018 Oct 23;9:524. doi: 10.3389/fpsyt.2018.00524. eCollection 2018.

Noro Psikiyatr Ars. 2018 May 28;55(3):280-290. doi: 10.5152/npa.2017.19482. eCollection 2018 Sep.

Probability of major depression diagnostic classification using semi-structured versus fully structured diagnostic interviews.使用半结构化与完全结构化诊断访谈对重度抑郁症诊断分类的可能性。

Br J Psychiatry. 2018 Jun;212(6):377-385. doi: 10.1192/bjp.2018.54. Epub 2018 May 2.

Depression and obesity: evidence of shared biological mechanisms.抑郁和肥胖：存在共同的生物学机制的证据。

Mol Psychiatry. 2019 Jan;24(1):18-33. doi: 10.1038/s41380-018-0017-5. Epub 2018 Feb 16.

Biomarkers in mood disorders: Are we there yet?情绪障碍中的生物标志物：我们做到了吗？

J Affect Disord. 2018 Jun;233:1-2. doi: 10.1016/j.jad.2018.01.002. Epub 2018 Jan 10.

Biomarkers for depression: recent insights, current challenges and future prospects.抑郁症的生物标志物：最新见解、当前挑战及未来前景

Neuropsychiatr Dis Treat. 2017 May 10;13:1245-1262. doi: 10.2147/NDT.S114542. eCollection 2017.

Biological markers for anxiety disorders, OCD and PTSD: A consensus statement. Part II: Neurochemistry, neurophysiology and neurocognition.焦虑症、强迫症和创伤后应激障碍的生物标志物：共识声明。第二部分：神经化学、神经生理学和神经认知。

World J Biol Psychiatry. 2017 Apr;18(3):162-214. doi: 10.1080/15622975.2016.1190867. Epub 2016 Jul 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用XGBOOST机器学习模型和大型生物标志物荷兰数据集（n = 11,081）改善抑郁症的诊断

Improving Diagnosis of Depression With XGBOOST Machine Learning Model and a Large Biomarkers Dutch Dataset ( = 11,081).

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献