Department of Imaging Science and Innovation, Geisinger, Danville, Pennsylvania.
Department of Biomedical Engineering, University of Kentucky, Lexington, Kentucky.
JACC Cardiovasc Imaging. 2019 Apr;12(4):681-689. doi: 10.1016/j.jcmg.2018.04.026. Epub 2018 Jun 13.
The goal of this study was to use machine learning to more accurately predict survival after echocardiography.
Predicting patient outcomes (e.g., survival) following echocardiography is primarily based on ejection fraction (EF) and comorbidities. However, there may be significant predictive information within additional echocardiography-derived measurements combined with clinical electronic health record data.
Mortality was studied in 171,510 unselected patients who underwent 331,317 echocardiograms in a large regional health system. The authors investigated the predictive performance of nonlinear machine learning models compared with that of linear logistic regression models using 3 different inputs: 1) clinical variables, including 90 cardiovascular-relevant International Classification of Diseases, Tenth Revision, codes, and age, sex, height, weight, heart rate, blood pressures, low-density lipoprotein, high-density lipoprotein, and smoking; 2) clinical variables plus physician-reported EF; and 3) clinical variables and EF, plus 57 additional echocardiographic measurements. Missing data were imputed with a multivariate imputation by using a chained equations algorithm (MICE). The authors compared models versus each other and baseline clinical scoring systems by using a mean area under the curve (AUC) over 10 cross-validation folds and across 10 survival durations (6 to 60 months).
Machine learning models achieved significantly higher prediction accuracy (all AUC >0.82) over common clinical risk scores (AUC = 0.61 to 0.79), with the nonlinear random forest models outperforming logistic regression (p < 0.01). The random forest model including all echocardiographic measurements yielded the highest prediction accuracy (p < 0.01 across all models and survival durations). Only 10 variables were needed to achieve 96% of the maximum prediction accuracy, with 6 of these variables being derived from echocardiography. Tricuspid regurgitation velocity was more predictive of survival than LVEF. In a subset of studies with complete data for the top 10 variables, multivariate imputation by chained equations yielded slightly reduced predictive accuracies (difference in AUC of 0.003) compared with the original data.
Machine learning can fully utilize large combinations of disparate input variables to predict survival after echocardiography with superior accuracy.
本研究旨在使用机器学习更准确地预测超声心动图后的生存情况。
预测超声心动图后患者的结局(如生存)主要基于射血分数(EF)和合并症。然而,在结合临床电子健康记录数据的情况下,可能存在来自其他超声心动图衍生测量的重要预测信息。
在一个大型区域卫生系统中,对 171510 名接受了 331317 次超声心动图检查的未选择患者进行了死亡率研究。作者使用 3 种不同输入来研究非线性机器学习模型与线性逻辑回归模型的预测性能:1)临床变量,包括 90 个心血管相关的国际疾病分类,第十版代码以及年龄、性别、身高、体重、心率、血压、低密度脂蛋白、高密度脂蛋白和吸烟;2)临床变量加医生报告的 EF;3)临床变量和 EF,外加 57 个额外的超声心动图测量值。使用链式方程算法(MICE)进行多元插补处理缺失数据。作者通过在 10 个交叉验证折叠和 10 个生存时间(6 至 60 个月)上计算曲线下面积(AUC)的平均值,比较了模型之间以及与基线临床评分系统的差异。
机器学习模型在常见临床风险评分(AUC=0.61 至 0.79)的基础上实现了显著更高的预测准确性(所有 AUC>0.82),其中非线性随机森林模型优于逻辑回归(p<0.01)。包含所有超声心动图测量值的随机森林模型得出了最高的预测准确性(在所有模型和生存时间中,p<0.01)。仅需要 10 个变量就可以达到最大预测准确性的 96%,其中 6 个变量来自超声心动图。三尖瓣反流速度比 LVEF 更能预测生存。在具有前 10 个变量完整数据的研究子集,链式方程的多元插补与原始数据相比略降低了预测准确性(AUC 的差异为 0.003)。
机器学习可以充分利用大量不同输入变量的组合,以更高的准确性预测超声心动图后的生存情况。