Department of Medical Genetics and Developmental Biology, School of Basic Medical Sciences, Capital Medical University, Beijing, 10069, China.
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, 15232 USA; Harvard Medical School, Harvard University, Boston, MA, 02215 USA.
Med Image Anal. 2021 Apr;69:101942. doi: 10.1016/j.media.2020.101942. Epub 2020 Dec 26.
Congenital heart disease (CHD) is the most common birth defect and the leading cause of neonate death in China. Clinical diagnosis can be based on the selected 2D key-frames from five views. Limited by the availability of multi-view data, most methods have to rely on the insufficient single view analysis. This study proposes to automatically analyze the multi-view echocardiograms with a practical end-to-end framework. We collect the five-view echocardiograms video records of 1308 subjects (including normal controls, ventricular septal defect (VSD) patients and atrial septal defect (ASD) patients) with both disease labels and standard-view key-frame labels. Depthwise separable convolution-based multi-channel networks are adopted to largely reduce the network parameters. We also approach the imbalanced class problem by augmenting the positive training samples. Our 2D key-frame model can diagnose CHD or negative samples with an accuracy of 95.4%, and in negative, VSD or ASD classification with an accuracy of 92.3%. To further alleviate the work of key-frame selection in real-world implementation, we propose an adaptive soft attention scheme to directly explore the raw video data. Four kinds of neural aggregation methods are systematically investigated to fuse the information of an arbitrary number of frames in a video. Moreover, with a view detection module, the system can work without the view records. Our video-based model can diagnose with an accuracy of 93.9% (binary classification), and 92.1% (3-class classification) in a collected 2D video testing set, which does not need key-frame selection and view annotation in testing. The detailed ablation study and the interpretability analysis are provided. The presented model has high diagnostic rates for VSD and ASD that can be potentially applied to the clinical practice in the future. The short-term automated machine learning process can partially replace and promote the long-term professional training of primary doctors, improving the primary diagnosis rate of CHD in China, and laying the foundation for early diagnosis and timely treatment of children with CHD.
先天性心脏病(CHD)是中国最常见的出生缺陷和新生儿死亡的主要原因。临床诊断可以基于从五个视图中选择的二维关键帧。由于多视图数据的可用性有限,大多数方法不得不依赖于不足的单视图分析。本研究提出了一种实用的端到端框架,用于自动分析多视图超声心动图。我们收集了 1308 名受试者(包括正常对照组、室间隔缺损(VSD)患者和房间隔缺损(ASD)患者)的五视图超声心动图视频记录,这些记录既有疾病标签又有标准视图关键帧标签。基于深度可分离卷积的多通道网络被采用来大大减少网络参数。我们还通过增加正训练样本来解决不平衡类问题。我们的二维关键帧模型可以以 95.4%的准确率诊断 CHD 或阴性样本,以 92.3%的准确率在阴性、VSD 或 ASD 分类中进行诊断。为了进一步减轻在实际实施中选择关键帧的工作,我们提出了一种自适应软注意方案,直接探索原始视频数据。系统地研究了四种神经聚合方法,以融合视频中任意数量的帧的信息。此外,通过一个视图检测模块,该系统可以在没有视图记录的情况下工作。我们的基于视频的模型可以以 93.9%的准确率(二分类)和 92.1%的准确率(三分类)进行诊断,在一个收集的二维视频测试集中,不需要在测试中选择关键帧和视图注释。提供了详细的消融研究和可解释性分析。该模型对 VSD 和 ASD 的诊断率很高,将来可能会应用于临床实践。短期的自动化机器学习过程可以部分替代和促进基层医生的长期专业培训,提高中国 CHD 的初级诊断率,为儿童 CHD 的早期诊断和及时治疗奠定基础。