Suppr超能文献

利用自我报告的症状、生命体征和基于血液的标志物,解释性机器学习算法来区分双相情感障碍和重度抑郁症。

Explainable machine-learning algorithms to differentiate bipolar disorder from major depressive disorder using self-reported symptoms, vital signs, and blood-based markers.

机构信息

West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China; Med-X Center for Informatics, Sichuan University, Chengdu, China.

Business School, Sichuan University, Chengdu, China.

出版信息

Comput Methods Programs Biomed. 2023 Oct;240:107723. doi: 10.1016/j.cmpb.2023.107723. Epub 2023 Jul 17.

Abstract

BACKGROUND AND OBJECTIVE

Caused by shared genetic risk factors and similar neuropsychological symptoms, bipolar disorder (BD) and major depressive disorder (MDD) are at high risk of misdiagnosis, which is associated with ineffective treatment and worsening of outcomes. We aimed to develop a machine learning (ML)-based diagnostic system, based on electronic medical records (EMR) data, to mimic the clinical reasoning of human physicians to differentiate MDD and BD (especially BD depressive episodes) patients about to be admitted to a hospital and, hence, reduce the misdiagnosis of BD as MDD on admission. In addition, we examined to what extent our ML model could be made interpretable by quantifying and visualizing the features that drive the predictions.

METHODS

By identifying 16,311 patients admitted to a hospital located in western China between 2009 and 2018 with a recorded main diagnosis of MDD or BD, we established three sub-cohorts with different combinations of features for both the MDD-BD cohort and the MDD-BD depressive episodes cohort, respectively. Four different ML algorithms (logistic regression, extreme gradient boosting (XGBoost), random forest, and support vector machine) and four train-test splits were used to train and validate diagnostic models, and explainable methods (SHAP and Break Down) were utilized to analyze the contribution of each of the features at both population-level and individual-level, including feature importance, feature interaction, and feature effect on prediction decision for a specific subject.

RESULTS

The XGBoost algorithm provided the best test performance (AUC: 0.838 (0.810-0.867), PPV: 0.810 and NPV: 0.834) for separating patients with BD from those with MDD. Core predictors included symptoms (mood-up, exciting, bad sleep, loss of interest, talking, mood-down, provoke), along with age, job, myocardial enzyme markers (creatine kinase, hydroxybutyrate dehydrogenase), diabetes-associated marker (glucose), bone function marker (alkaline phosphatase), non-enzymatic antioxidant (uric acid), markers of immune/inflammation (white blood cell count, lymphocyte count, basophil percentage, monocyte count), cardiovascular function marker (low density lipoprotein), renal marker (total protein), liver biochemistry marker (indirect bilirubin), and vital signs like pulse. For separating patients with BD depressive episodes from those with MDD, the test AUC was 0.777 (0.732-0.822), with PPV 0.576 and NPV 0.899. Additional validation in models built with self-reported symptoms removed from the feature set, showed test AUC of 0.701 (0.666-0.736) for differentiating BD and MDD, and AUC of 0.564 (0.515-0.614) for detecting patients in BD depressive episodes from MDD patients. Validation in the datasets without removing the patients with comorbidity showed an AUC of 0.826 (0.806-0.846).

CONCLUSION

The diagnostic system accurately identified patients with BD in various clinical scenarios, and differences in patterns of peripheral markers between BD and MDD could enrich our understanding of potential underlying pathophysiological mechanisms of them.

摘要

背景和目的

双相障碍(BD)和重度抑郁症(MDD)由于具有共同的遗传风险因素和相似的神经心理学症状,因此误诊的风险很高,这与无效治疗和病情恶化有关。我们旨在开发一种基于机器学习(ML)的诊断系统,该系统基于电子病历(EMR)数据,模仿人类医生的临床推理,以区分即将住院的 MDD 和 BD(特别是 BD 抑郁发作)患者,从而减少入院时 BD 误诊为 MDD 的情况。此外,我们通过量化和可视化驱动预测的特征,来研究我们的 ML 模型在多大程度上可以被解释。

方法

通过确定 2009 年至 2018 年间在中国西部某医院住院的 16,311 名以 MDD 或 BD 为主要诊断的患者,我们分别为 MDD-BD 队列和 MDD-BD 抑郁发作队列建立了三个不同特征组合的子队列。我们使用了四种不同的 ML 算法(逻辑回归、极端梯度提升(XGBoost)、随机森林和支持向量机)和四种训练-测试分割来训练和验证诊断模型,并使用可解释性方法(SHAP 和 Break Down)来分析每个特征在群体和个体水平上的贡献,包括特征重要性、特征交互和特征对特定主体预测决策的影响。

结果

XGBoost 算法在区分 BD 患者和 MDD 患者方面提供了最佳的测试性能(AUC:0.838(0.810-0.867),PPV:0.810 和 NPV:0.834)。核心预测因子包括症状(情绪高涨、兴奋、睡眠不佳、兴趣减退、说话、情绪低落、激怒),以及年龄、职业、心肌酶标志物(肌酸激酶、羟丁酸脱氢酶)、糖尿病相关标志物(血糖)、骨功能标志物(碱性磷酸酶)、非酶抗氧化剂(尿酸)、免疫/炎症标志物(白细胞计数、淋巴细胞计数、嗜碱性粒细胞百分比、单核细胞计数)、心血管功能标志物(低密度脂蛋白)、肾标志物(总蛋白)、肝生化标志物(间接胆红素)和生命体征如脉搏。在区分 BD 抑郁发作患者和 MDD 患者时,测试 AUC 为 0.777(0.732-0.822),PPV 为 0.576,NPV 为 0.899。在从特征集中去除自我报告的症状后构建的模型中进行额外验证,结果显示区分 BD 和 MDD 的测试 AUC 为 0.701(0.666-0.736),区分 BD 抑郁发作患者和 MDD 患者的 AUC 为 0.564(0.515-0.614)。在不排除合并症患者的数据集进行验证时,AUC 为 0.826(0.806-0.846)。

结论

该诊断系统能够准确识别各种临床情况下的 BD 患者,BD 和 MDD 之间外周标志物模式的差异可以丰富我们对它们潜在病理生理机制的理解。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验