Department of Electronic and Computer Engineering, University of Limerick, Limerick, Ireland.
Health Research Institute, University of Limerick, Limerick, Ireland.
JMIR Ment Health. 2024 Jul 4;11:e52045. doi: 10.2196/52045.
Identifying individuals with depressive symptomatology (DS) promptly and effectively is of paramount importance for providing timely treatment. Machine learning models have shown promise in this area; however, studies often fall short in demonstrating the practical benefits of using these models and fail to provide tangible real-world applications.
This study aims to establish a novel methodology for identifying individuals likely to exhibit DS, identify the most influential features in a more explainable way via probabilistic measures, and propose tools that can be used in real-world applications.
The study used 3 data sets: PROACTIVE, the Brazilian National Health Survey (Pesquisa Nacional de Saúde [PNS]) 2013, and PNS 2019, comprising sociodemographic and health-related features. A Bayesian network was used for feature selection. Selected features were then used to train machine learning models to predict DS, operationalized as a score of ≥10 on the 9-item Patient Health Questionnaire. The study also analyzed the impact of varying sensitivity rates on the reduction of screening interviews compared to a random approach.
The methodology allows the users to make an informed trade-off among sensitivity, specificity, and a reduction in the number of interviews. At the thresholds of 0.444, 0.412, and 0.472, determined by maximizing the Youden index, the models achieved sensitivities of 0.717, 0.741, and 0.718, and specificities of 0.644, 0.737, and 0.766 for PROACTIVE, PNS 2013, and PNS 2019, respectively. The area under the receiver operating characteristic curve was 0.736, 0.801, and 0.809 for these 3 data sets, respectively. For the PROACTIVE data set, the most influential features identified were postural balance, shortness of breath, and how old people feel they are. In the PNS 2013 data set, the features were the ability to do usual activities, chest pain, sleep problems, and chronic back problems. The PNS 2019 data set shared 3 of the most influential features with the PNS 2013 data set. However, the difference was the replacement of chronic back problems with verbal abuse. It is important to note that the features contained in the PNS data sets differ from those found in the PROACTIVE data set. An empirical analysis demonstrated that using the proposed model led to a potential reduction in screening interviews of up to 52% while maintaining a sensitivity of 0.80.
This study developed a novel methodology for identifying individuals with DS, demonstrating the utility of using Bayesian networks to identify the most significant features. Moreover, this approach has the potential to substantially reduce the number of screening interviews while maintaining high sensitivity, thereby facilitating improved early identification and intervention strategies for individuals experiencing DS.
及时有效地识别出有抑郁症状的个体至关重要,这有助于为他们提供及时的治疗。机器学习模型在这一领域显示出了很大的潜力;然而,这些研究往往未能展示使用这些模型的实际益处,也未能提供切实可行的实际应用。
本研究旨在建立一种新的方法,用于识别可能出现抑郁症状的个体,通过概率测度更直观地识别出最有影响力的特征,并提出可在实际应用中使用的工具。
本研究使用了 3 个数据集:PROACTIVE、巴西全国健康调查(Pesquisa Nacional de Saúde [PNS])2013 年和 PNS 2019 年的数据,这些数据包含社会人口学和健康相关的特征。贝叶斯网络用于特征选择。选择的特征随后用于训练机器学习模型,以预测抑郁症状,其操作为使用 9 项患者健康问卷(Patient Health Questionnaire)得到的分数≥10。本研究还分析了不同敏感性率对减少与随机方法相比的筛查访谈次数的影响。
该方法允许用户在敏感性、特异性和访谈次数减少之间进行明智的权衡。在通过最大化约登指数确定的 0.444、0.412 和 0.472 阈值下,模型在 PROACTIVE、PNS 2013 和 PNS 2019 数据集中分别达到了 0.717、0.741 和 0.718 的敏感性,0.644、0.737 和 0.766 的特异性。这些数据集的受试者工作特征曲线下面积分别为 0.736、0.801 和 0.809。对于 PROACTIVE 数据集,确定的最有影响力的特征是姿势平衡、呼吸急促和人们感觉自己的年龄。在 PNS 2013 数据集中,特征是进行日常活动的能力、胸痛、睡眠问题和慢性背部问题。PNS 2019 数据集与 PNS 2013 数据集共享 3 个最有影响力的特征。然而,不同之处在于用言语虐待取代了慢性背部问题。需要注意的是,PNS 数据集包含的特征与 PROACTIVE 数据集不同。实证分析表明,使用提出的模型可以潜在地减少多达 52%的筛查访谈,同时保持 0.80 的敏感性。
本研究开发了一种新的方法来识别有抑郁症状的个体,展示了使用贝叶斯网络来识别最重要特征的实用性。此外,这种方法有可能大大减少筛查访谈的数量,同时保持高敏感性,从而促进对有抑郁症状的个体的早期识别和干预策略。