Department of Emergency Medicine, University of Utah, Salt Lake City, UT, USA.
University of Utah School of Medicine, Salt Lake City, UT, USA.
J Ultrasound Med. 2022 Dec;41(12):3003-3012. doi: 10.1002/jum.16007. Epub 2022 May 12.
To test if a deep learning (DL) model trained on echocardiography images could accurately segment the left ventricle (LV) and predict ejection fraction on apical 4-chamber images acquired by point-of-care ultrasound (POCUS).
We created a dataset of 333 videos from cardiac POCUS exams acquired in the emergency department. For each video we derived two ground-truth labels. First, we segmented the LV from one image frame and second, we classified the EF as normal, reduced, or severely reduced. We then classified the media's quality as optimal, adequate, or inadequate. With this dataset we tested the accuracy of automated LV segmentation and EF classification by the best-in-class echocardiography trained DL model EchoNet-Dynamic.
The mean Dice similarity coefficient for LV segmentation was 0.72 (N = 333; 95% CI 0.70-0.74). Cohen's kappa coefficient for agreement between predicted and ground-truth EF classification was 0.16 (N = 333). The area under the receiver-operating curve for the diagnosis of heart failure was 0.74 (N = 333). Model performance improved with video quality for the tasks of LV segmentation and diagnosis of heart failure, but was unchanged with EF classification. For all tasks the model was less accurate than the published benchmarks for EchoNet-Dynamic.
Performance of a DL model trained on formal echocardiography worsened when challenged with images captured during resuscitations. DL models intended for assessing bedside ultrasound should be trained on datasets composed of POCUS images. Such datasets have yet to be made publicly available.
测试一个基于超声心动图图像的深度学习(DL)模型是否能够准确地对经床旁超声(POCUS)获取的心尖 4 腔图像进行左心室(LV)分段并预测射血分数。
我们创建了一个包含 333 个来自急诊科心脏 POCUS 检查的视频数据集。对于每个视频,我们得出了两个真实标签。首先,我们从一个图像帧中分割 LV,其次,我们将 EF 分类为正常、降低或严重降低。然后我们将媒体质量分类为最佳、足够或不足。使用这个数据集,我们通过最佳的超声心动图训练的 DL 模型 EchoNet-Dynamic 测试了自动 LV 分段和 EF 分类的准确性。
LV 分段的平均 Dice 相似系数为 0.72(N=333;95%CI 0.70-0.74)。预测 EF 分类与真实 EF 分类之间的 Cohen's kappa 系数为 0.16(N=333)。用于诊断心力衰竭的接收者操作特征曲线下面积为 0.74(N=333)。对于 LV 分段和心力衰竭诊断任务,模型性能随着视频质量的提高而提高,但 EF 分类则不变。对于所有任务,该模型的性能都不如 EchoNet-Dynamic 的已发表基准差。
在对复苏期间捕获的图像进行挑战时,经过正式超声心动图训练的 DL 模型的性能会恶化。用于评估床边超声的 DL 模型应在由 POCUS 图像组成的数据集上进行训练。这些数据集尚未公开。