Ahmed Iftikhar, Brahmacharimayum Anushree, Ali Raja Hashim, Khan Talha Ali, Ahmad Muhammad Ovais
Department of Software Engineering, University of Europe for Applied Sciences, Germany, Potsdam, Germany.
Department of Mathematics and Computer Science, Karlstad University, Universitetsgatan 2, Karlstad, 65188, Sweden, 46 76-113 22 49.
JMIR Ment Health. 2025 Sep 11;12:e72038. doi: 10.2196/72038.
Depression is one of the most prevalent mental health disorders globally, affecting approximately 280 million people and frequently going undiagnosed or misdiagnosed. The growing ubiquity of wearable devices enables continuous monitoring of activity levels, providing a new avenue for data-driven detection and severity assessment of depression. However, existing machine learning models often exhibit lower performance when distinguishing overlapping subtypes of depression and frequently lack explainability, an essential component for clinical acceptance.
This study aimed to develop and evaluate an interpretable machine learning framework for detecting depression and classifying its severity using wearable-actigraphy data, while addressing common challenges such as imbalanced datasets and limited model transparency.
We used the Depresjon dataset and applied Adaptive Synthetic Sampling (ADASYN) to mitigate class imbalance. We extracted multiple statistical features (eg, power spectral density mean and autocorrelation) and demographic attributes (eg, age) from the raw activity data. Five machine learning algorithms (logistic regression, support vector machines, random forest, XGBoost, and neural networks) were assessed via accuracy, precision, recall, F1-score, specificity, and Matthew correlation constant. We further used Shapley Additive Explanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) to elucidate prediction drivers.
XGBoost achieved the highest overall accuracy of 84.94% for binary classification and 85.91% for multiclass severity. SHAP and LIME revealed power spectral density mean, age, and autocorrelation as top predictors, highlighting circadian disruptions' role in depression.
Our interpretable framework reliably identifies depressed versus nondepressed individuals and differentiates mild from moderate depression. The inclusion of SHAP and LIME provides transparent, clinically meaningful insights, emphasizing the potential of explainable artificial intelligence to enhance early detection and intervention strategies in mental health care.
抑郁症是全球最普遍的心理健康障碍之一,影响着约2.8亿人,且常常未被诊断或误诊。可穿戴设备的日益普及使得能够持续监测活动水平,为基于数据驱动的抑郁症检测和严重程度评估提供了一条新途径。然而,现有的机器学习模型在区分抑郁症的重叠亚型时往往表现出较低的性能,并且常常缺乏可解释性,而可解释性是临床接受的一个重要组成部分。
本研究旨在开发并评估一个可解释的机器学习框架,用于使用可穿戴活动记录仪数据检测抑郁症并对其严重程度进行分类,同时解决诸如数据集不平衡和模型透明度有限等常见挑战。
我们使用了Depresjon数据集,并应用自适应合成采样(ADASYN)来缓解类别不平衡问题。我们从原始活动数据中提取了多个统计特征(例如,功率谱密度均值和自相关)以及人口统计学属性(例如,年龄)。通过准确率、精确率、召回率、F1分数、特异性和马修斯相关系数对五种机器学习算法(逻辑回归、支持向量机、随机森林、XGBoost和神经网络)进行了评估。我们还进一步使用了夏普利值加法解释(SHAP)和局部可解释模型无关解释(LIME)来阐明预测驱动因素。
对于二分类,XGBoost实现了最高的总体准确率84.94%,对于多类别严重程度分类则为85.91%。SHAP和LIME揭示了功率谱密度均值、年龄和自相关是主要预测因素,突出了昼夜节律紊乱在抑郁症中的作用。
我们的可解释框架能够可靠地识别抑郁与非抑郁个体,并区分轻度与中度抑郁症。纳入SHAP和LIME提供了透明的、具有临床意义的见解,强调了可解释人工智能在加强精神卫生保健中的早期检测和干预策略方面的潜力。