Department of Translational Data Science and Informatics (A.E.U-C., L.J., J.M.P., S.R., J.A.R., C.W.G., C.M.H., B.K.F., R.C.); Geisinger, Danville, PA.
Heart and Vascular Center, Evangelical Hospital, Lewisburg, PA (J.M.P.).
Circulation. 2022 Jul 5;146(1):36-47. doi: 10.1161/CIRCULATIONAHA.121.057869. Epub 2022 May 9.
Timely diagnosis of structural heart disease improves patient outcomes, yet many remain underdiagnosed. While population screening with echocardiography is impractical, ECG-based prediction models can help target high-risk patients. We developed a novel ECG-based machine learning approach to predict multiple structural heart conditions, hypothesizing that a composite model would yield higher prevalence and positive predictive values to facilitate meaningful recommendations for echocardiography.
Using 2 232 130 ECGs linked to electronic health records and echocardiography reports from 484 765 adults between 1984 to 2021, we trained machine learning models to predict the presence or absence of any of 7 echocardiography-confirmed diseases within 1 year. This composite label included the following: moderate or severe valvular disease (aortic/mitral stenosis or regurgitation, tricuspid regurgitation), reduced ejection fraction <50%, or interventricular septal thickness >15 mm. We tested various combinations of input features (demographics, laboratory values, structured ECG data, ECG traces) and evaluated model performance using 5-fold cross-validation, multisite validation trained on 1 site and tested on 10 independent sites, and simulated retrospective deployment trained on pre-2010 data and deployed in 2010.
Our composite rECHOmmend model used age, sex, and ECG traces and had a 0.91 area under the receiver operating characteristic curve and a 42% positive predictive value at 90% sensitivity, with a composite label prevalence of 17.9%. Individual disease models had area under the receiver operating characteristic curves from 0.86 to 0.93 and lower positive predictive values from 1% to 31%. Area under the receiver operating characteristic curves for models using different input features ranged from 0.80 to 0.93, increasing with additional features. Multisite validation showed similar results to cross-validation, with an aggregate area under the receiver operating characteristic curve of 0.91 across our independent test set of 10 clinical sites after training on a separate site. Our simulated retrospective deployment showed that for ECGs acquired in patients without preexisting structural heart disease in the year 2010, 11% were classified as high risk and 41% (4.5% of total patients) developed true echocardiography-confirmed disease within 1 year.
An ECG-based machine learning model using a composite end point can identify a high-risk population for having undiagnosed, clinically significant structural heart disease while outperforming single-disease models and improving practical utility with higher positive predictive values. This approach can facilitate targeted screening with echocardiography to improve underdiagnosis of structural heart disease.
及时诊断结构性心脏病可改善患者预后,但仍有许多患者未被诊断。虽然人群中用超声心动图进行筛查并不实际,但基于心电图的预测模型可以帮助确定高危患者。我们开发了一种新的基于心电图的机器学习方法来预测多种结构性心脏病,假设复合模型将具有更高的患病率和阳性预测值,以方便为超声心动图提出有意义的建议。
我们使用了 1984 年至 2021 年间从 484765 名成年人的 2232130 份心电图和电子健康记录以及超声心动图报告,训练了机器学习模型来预测在 1 年内任何 7 种经超声心动图证实的疾病的存在或不存在。这个复合标签包括以下内容:中重度瓣膜病(主动脉瓣/二尖瓣狭窄或反流、三尖瓣反流)、射血分数<50%或室间隔厚度>15mm。我们测试了各种输入特征(人口统计学、实验室值、结构化心电图数据、心电图轨迹)的组合,并使用 5 折交叉验证、在 1 个站点上训练并在 10 个独立站点上测试的多站点验证、以及在 2010 年之前的数据上训练并在 2010 年部署的模拟回顾性部署来评估模型性能。
我们的复合 rECHOmmend 模型使用了年龄、性别和心电图轨迹,其受试者工作特征曲线下面积为 0.91,阳性预测值为 42%,灵敏度为 90%,复合标签的患病率为 17.9%。个别疾病模型的受试者工作特征曲线下面积从 0.86 到 0.93,阳性预测值从 1%到 31%不等。使用不同输入特征的模型的受试者工作特征曲线下面积从 0.80 到 0.93,增加了额外的特征。多站点验证显示与交叉验证相似的结果,在使用单独站点进行训练后,我们独立测试集的 10 个临床站点的汇总受试者工作特征曲线下面积为 0.91。我们的模拟回顾性部署表明,对于 2010 年无结构性心脏病既往史的患者进行心电图检查,11%被归类为高危人群,41%(占总患者的 4.1%)在 1 年内发展为真正的超声心动图确诊疾病。
一种基于心电图的机器学习模型,使用复合终点,可以识别出患有未确诊的、具有临床意义的结构性心脏病的高危人群,同时优于单病种模型,并通过更高的阳性预测值提高实际效用。这种方法可以通过超声心动图进行有针对性的筛查,以改善结构性心脏病的漏诊。