Riek Nathan T, Gokhale Tanmay A, Martin-Gill Christian, Kraevsky-Philips Karina, Zègre-Hemsey Jessica K, Saba Samir, Callaway Clifton W, Akcakaya Murat, Al-Zaiti Salah S
Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, USA.
Division of Cardiology, University of Pittsburgh, Pittsburgh, PA, USA; University of Pittsburgh Medical Center (UPMC), Pittsburgh, PA, USA.
J Electrocardiol. 2024 Nov-Dec;87:153792. doi: 10.1016/j.jelectrocard.2024.153792. Epub 2024 Sep 2.
Deep learning (DL) models offer improved performance in electrocardiogram (ECG)-based classification over rule-based methods. However, for widespread adoption by clinicians, explainability methods, like saliency maps, are essential.
On a subset of 100 ECGs from patients with chest pain, we generated saliency maps using a previously validated convolutional neural network for occlusion myocardial infarction (OMI) classification. Three clinicians reviewed ECG-saliency map dyads, first assessing the likelihood of OMI from standard ECGs and then evaluating clinical relevance and helpfulness of the saliency maps, as well as their confidence in the model's predictions. Questions were answered on a Likert scale ranging from +3 (most useful/relevant) to -3 (least useful/relevant).
The adjudicated accuracy of the three clinicians matched the DL model when considering area under the receiver operating characteristics curve (AUC) and F1 score (AUC 0.855 vs. 0.872, F1 score = 0.789 vs. 0.747). On average, clinicians found saliency maps slightly clinically relevant (0.96 ± 0.92) and slightly helpful (0.66 ± 0.98) in identifying or ruling out OMI but had higher confidence in the model's predictions (1.71 ± 0.56). Clinicians noted that leads I and aVL were often emphasized, even when obvious ST changes were present in other leads.
In this clinical usability study, clinicians deemed saliency maps somewhat helpful in enhancing explainability of DL-based ECG models. The spatial convolutional layers across the 12 leads in these models appear to contribute to the discrepancy between ECG segments considered most relevant by clinicians and segments that drove DL model predictions.
与基于规则的方法相比,深度学习(DL)模型在基于心电图(ECG)的分类中表现更优。然而,对于临床医生的广泛应用而言,诸如显著性图之类的可解释性方法至关重要。
在来自胸痛患者的100份心电图子集中,我们使用先前验证的用于闭塞性心肌梗死(OMI)分类的卷积神经网络生成了显著性图。三名临床医生审查了心电图-显著性图二元组,首先从标准心电图评估OMI的可能性,然后评估显著性图的临床相关性和有用性,以及他们对模型预测的信心。问题的回答采用李克特量表,范围从+3(最有用/相关)到-3(最无用/相关)。
在考虑受试者操作特征曲线下面积(AUC)和F1分数时,三名临床医生的判定准确性与DL模型相当(AUC分别为0.855和0.872,F1分数分别为0.789和0.747)。平均而言,临床医生发现显著性图在识别或排除OMI方面具有一定的临床相关性(0.96±0.92)和一定的帮助(0.66±0.98),但对模型的预测更有信心(1.71±0.56)。临床医生指出,即使其他导联存在明显的ST段改变,导联I和aVL也经常被强调。
在这项临床可用性研究中,临床医生认为显著性图在增强基于DL的心电图模型的可解释性方面有一定帮助。这些模型中12个导联上的空间卷积层似乎导致了临床医生认为最相关的心电图节段与驱动DL模型预测的节段之间的差异。