Kim Sekeun, Park Hyung-Bok, Jeon Jaeik, Arsanjani Reza, Heo Ran, Lee Sang-Eun, Moon Inki, Yoo Sun Kook, Chang Hyuk-Jae
CONNECT-AI Research Center, Yonsei University College of Medicine, Seoul, South Korea.
Graduate Program of Biomedical Engineering, Yonsei University College of Medicine, Seoul, South Korea.
Int J Cardiovasc Imaging. 2022 May;38(5):1047-1059. doi: 10.1007/s10554-021-02482-y. Epub 2022 Feb 13.
We aimed to compare the segmentation performance of the current prominent deep learning (DL) algorithms with ground-truth segmentations and to validate the reproducibility of the manually created 2D echocardiographic four cardiac chamber ground-truth annotation. Recently emerged DL based fully-automated chamber segmentation and function assessment methods have shown great potential for future application in aiding image acquisition, quantification, and suggestion for diagnosis. However, the performance of current DL algorithms have not previously been compared with each other. In addition, the reproducibility of ground-truth annotations which are the basis of these algorithms have not yet been fully validated. We retrospectively enrolled 500 consecutive patients who underwent transthoracic echocardiogram (TTE) from December 2019 to December 2020. Simple U-net, Res-U-net, and Dense-U-net algorithms were compared for the segmentation performances and clinical indices such as left atrial volume (LAV), left ventricular end diastolic volume (LVEDV), left ventricular end systolic volume (LVESV), LV mass, and ejection fraction (EF) were evaluated. The inter- and intra-observer variability analysis was performed by two expert sonographers for a randomly selected echocardiographic view in 100 patients (apical 2-chamber, apical 4-chamber, and parasternal short axis views). The overall performance of all DL methods was excellent [average dice similarity coefficient (DSC) 0.91 to 0.95 and average Intersection over union (IOU) 0.83 to 0.90], with the exception of LV wall area on PSAX view (average DSC of 0.83, IOU 0.72). In addition, there were no significant difference in clinical indices between ground truth and automated DL measurements. For inter- and intra-observer variability analysis, the overall intra observer reproducibility was excellent: LAV (ICC = 0.995), LVEDV (ICC = 0.996), LVESV (ICC = 0.997), LV mass (ICC = 0.991) and EF (ICC = 0.984). The inter-observer reproducibility was slightly lower as compared to intraobserver agreement: LAV (ICC = 0.976), LVEDV (ICC = 0.982), LVESV (ICC = 0.970), LV mass (ICC = 0.971), and EF (ICC = 0.899). The three current prominent DL-based fully automated methods are able to reliably perform four-chamber segmentation and quantification of clinical indices. Furthermore, we were able to validate the four cardiac chamber ground-truth annotation and demonstrate an overall excellent reproducibility, but still with some degree of inter-observer variability.
我们旨在将当前突出的深度学习(DL)算法的分割性能与真实分割结果进行比较,并验证手动创建的二维超声心动图四腔心真实标注的可重复性。最近出现的基于深度学习的全自动腔室分割和功能评估方法在辅助图像采集、量化和诊断建议方面显示出巨大的未来应用潜力。然而,目前DL算法的性能此前尚未相互比较。此外,作为这些算法基础的真实标注的可重复性尚未得到充分验证。我们回顾性纳入了2019年12月至2020年12月期间连续接受经胸超声心动图(TTE)检查的500例患者。比较了简单U-net、Res-U-net和Dense-U-net算法的分割性能,并评估了左心房容积(LAV)、左心室舒张末期容积(LVEDV)、左心室收缩末期容积(LVESV)、左心室质量和射血分数(EF)等临床指标。由两名专业超声医师对100例患者随机选取的超声心动图视图(心尖两腔心、心尖四腔心和胸骨旁短轴视图)进行观察者间和观察者内变异性分析。所有DL方法的总体性能都非常出色[平均骰子相似系数(DSC)为0.91至0.95,平均交并比(IOU)为0.83至0.90],胸骨旁短轴视图上的左心室壁面积除外(平均DSC为0.83,IOU为0.72)。此外,真实测量值与自动DL测量值之间的临床指标无显著差异。对于观察者间和观察者内变异性分析,观察者内总体可重复性非常出色:LAV(组内相关系数ICC = 0.995)、LVEDV(ICC = 0.996)、LVESV(ICC = 0.997)、左心室质量(ICC = 0.991)和EF(ICC = 0.984)。与观察者内一致性相比,观察者间可重复性略低:LAV(ICC = 0.976)、LVEDV(ICC = 0.982)、LVESV(ICC = 0.970)、左心室质量(ICC = 0.971)和EF(ICC = 0.899)。目前三种基于深度学习的突出全自动方法能够可靠地进行四腔心分割和临床指标量化。此外,我们能够验证四腔心真实标注并证明总体具有出色的可重复性,但仍存在一定程度的观察者间变异性。