Somani Sulaiman S, Honarvar Hossein, Narula Sukrit, Landi Isotta, Lee Shawn, Khachatoorian Yeraz, Rehmani Arsalan, Kim Andrew, De Freitas Jessica K, Teng Shelly, Jaladanki Suraj, Kumar Arvind, Russak Adam, Zhao Shan P, Freeman Robert, Levin Matthew A, Nadkarni Girish N, Kagen Alexander C, Argulian Edgar, Glicksberg Benjamin S
The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 770 Lexington Ave, 15th Fl, New York, NY, 10065, USA.
Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, 20 Copeland Ave, Hamilton, ON L8L 2X2, Canada.
Eur Heart J Digit Health. 2021 Nov 25;3(1):56-66. doi: 10.1093/ehjdh/ztab101. eCollection 2022 Mar.
Clinical scoring systems for pulmonary embolism (PE) screening have low specificity and contribute to computed tomography pulmonary angiogram (CTPA) overuse. We assessed whether deep learning models using an existing and routinely collected data modality, electrocardiogram (ECG) waveforms, can increase specificity for PE detection.
We create a retrospective cohort of 21 183 patients at moderate- to high suspicion of PE and associate 23 793 CTPAs (10.0% PE-positive) with 320 746 ECGs and encounter-level clinical data (demographics, comorbidities, vital signs, and labs). We develop three machine learning models to predict PE likelihood: an ECG model using only ECG waveform data, an EHR model using tabular clinical data, and a Fusion model integrating clinical data and an embedded representation of the ECG waveform. We find that a Fusion model [area under the receiver-operating characteristic curve (AUROC) 0.81 ± 0.01] outperforms both the ECG model (AUROC 0.59 ± 0.01) and EHR model (AUROC 0.65 ± 0.01). On a sample of 100 patients from the test set, the Fusion model also achieves greater specificity (0.18) and performance (AUROC 0.84 ± 0.01) than four commonly evaluated clinical scores: Wells' Criteria, Revised Geneva Score, Pulmonary Embolism Rule-Out Criteria, and 4-Level Pulmonary Embolism Clinical Probability Score (AUROC 0.50-0.58, specificity 0.00-0.05). The model is superior to these scores on feature sensitivity analyses (AUROC 0.66-0.84) and achieves comparable performance across sex (AUROC 0.81) and racial/ethnic (AUROC 0.77-0.84) subgroups.
Synergistic deep learning of ECG waveforms with traditional clinical variables can increase the specificity of PE detection in patients at least at moderate suspicion for PE.
肺栓塞(PE)筛查的临床评分系统特异性较低,导致计算机断层扫描肺动脉造影(CTPA)过度使用。我们评估了使用现有且常规收集的数据模式——心电图(ECG)波形的深度学习模型是否能提高PE检测的特异性。
我们创建了一个对PE中度至高度怀疑的21183例患者的回顾性队列,将23793次CTPA(10.0%为PE阳性)与320746份ECG及就诊时的临床数据(人口统计学、合并症、生命体征和实验室检查结果)相关联。我们开发了三种机器学习模型来预测PE可能性:仅使用ECG波形数据的ECG模型、使用表格形式临床数据的电子健康记录(EHR)模型以及整合临床数据和ECG波形嵌入表示的融合模型。我们发现融合模型[受试者操作特征曲线下面积(AUROC)为0.81±0.01]优于ECG模型(AUROC为0.59±0.01)和EHR模型(AUROC为0.65±0.01)。在测试集的100例患者样本中,融合模型还比四种常用的临床评分具有更高的特异性(0.18)和性能(AUROC为0.84±0.01):Wells标准、修订的日内瓦评分、肺栓塞排除标准和四级肺栓塞临床概率评分(AUROC为0.50 - 0.58,特异性为0.00 - 0.05)。该模型在特征敏感性分析(AUROC为0.66 - 0.84)方面优于这些评分,并且在性别(AUROC为0.81)和种族/族裔(AUROC为0.77 - 0.84)亚组中表现相当。
将ECG波形与传统临床变量进行协同深度学习可提高至少对PE中度怀疑患者的PE检测特异性。