Section of Cardiology, Baylor College of Medicine, 1 Baylor Plaza, Houston, TX, 77030, USA.
Icahn School of Medicine at Mount Sinai, The the Zena and Michael A. Wiener Cardiovascular Institute, Mount Sinai Heart, New York, NY, USA.
Sci Rep. 2021 Apr 26;11(1):8992. doi: 10.1038/s41598-021-88172-0.
Machine learning (ML) and deep learning (DL) can successfully predict high prevalence events in very large databases (big data), but the value of this methodology for risk prediction in smaller cohorts with uncommon diseases and infrequent events is uncertain. The clinical course of spontaneous coronary artery dissection (SCAD) is variable, and no reliable methods are available to predict mortality. Based on the hypothesis that machine learning (ML) and deep learning (DL) techniques could enhance the identification of patients at risk, we applied a deep neural network to information available in electronic health records (EHR) to predict in-hospital mortality in patients with SCAD. We extracted patient data from the EHR of an extensive urban health system and applied several ML and DL models using candidate clinical variables potentially associated with mortality. We partitioned the data into training and evaluation sets with cross-validation. We estimated model performance based on the area under the receiver-operator characteristics curve (AUC) and balanced accuracy. As sensitivity analyses, we examined results limited to cases with complete clinical information available. We identified 375 SCAD patients of which mortality during the index hospitalization was 11.5%. The best-performing DL algorithm identified in-hospital mortality with AUC 0.98 (95% CI 0.97-0.99), compared to other ML models (P < 0.0001). For prediction of mortality using ML models in patients with SCAD, the AUC ranged from 0.50 with the random forest method (95% CI 0.41-0.58) to 0.95 with the AdaBoost model (95% CI 0.93-0.96), with intermediate performance using logistic regression, decision tree, support vector machine, K-nearest neighbors, and extreme gradient boosting methods. A deep neural network model was associated with higher predictive accuracy and discriminative power than logistic regression or ML models for identification of patients with ACS due to SCAD prone to early mortality.
机器学习(ML)和深度学习(DL)可以成功地预测大型数据库(大数据)中高患病率事件,但该方法在罕见疾病和罕见事件的较小队列中进行风险预测的价值尚不确定。自发性冠状动脉夹层(SCAD)的临床病程多变,目前尚无可靠方法预测死亡率。基于机器学习(ML)和深度学习(DL)技术可以增强对高危患者识别的假设,我们应用深度神经网络来分析电子健康记录(EHR)中可用的信息,以预测 SCAD 患者的住院死亡率。我们从广泛的城市卫生系统的 EHR 中提取患者数据,并使用可能与死亡率相关的候选临床变量应用几种 ML 和 DL 模型。我们将数据分为训练集和验证集,并进行交叉验证。我们根据接收者操作特征曲线(ROC)下的面积(AUC)和平衡准确性来评估模型性能。作为敏感性分析,我们检查了仅限于具有完整临床信息病例的结果。我们确定了 375 例 SCAD 患者,其中指数住院期间的死亡率为 11.5%。表现最好的 DL 算法识别出的住院死亡率的 AUC 为 0.98(95%CI 0.97-0.99),优于其他 ML 模型(P<0.0001)。对于使用 ML 模型预测 SCAD 患者的死亡率,AUC 范围从随机森林方法的 0.50(95%CI 0.41-0.58)到 AdaBoost 模型的 0.95(95%CI 0.93-0.96),逻辑回归、决策树、支持向量机、K-最近邻和极端梯度提升方法的中间性能。与逻辑回归或 ML 模型相比,深度神经网络模型与更高的预测准确性和区分能力相关,用于识别因 SCAD 导致早期死亡率较高的 ACS 患者。