IEEE Trans Med Imaging. 2021 Jun;40(6):1591-1602. doi: 10.1109/TMI.2021.3059956. Epub 2021 Jun 1.
Recently, automatic diagnostic approaches have been widely used to classify ocular diseases. Most of these approaches are based on a single imaging modality (e.g., fundus photography or optical coherence tomography (OCT)), which usually only reflect the oculopathy to a certain extent, and neglect the modality-specific information among different imaging modalities. This paper proposes a novel modality-specific attention network (MSAN) for multi-modal retinal image classification, which can effectively utilize the modality-specific diagnostic features from fundus and OCT images. The MSAN comprises two attention modules to extract the modality-specific features from fundus and OCT images, respectively. Specifically, for the fundus image, ophthalmologists need to observe local and global pathologies at multiple scales (e.g., from microaneurysms at the micrometer level, optic disc at millimeter level to blood vessels through the whole eye). Therefore, we propose a multi-scale attention module to extract both the local and global features from fundus images. Moreover, large background regions exist in the OCT image, which is meaningless for diagnosis. Thus, a region-guided attention module is proposed to encode the retinal layer-related features and ignore the background in OCT images. Finally, we fuse the modality-specific features to form a multi-modal feature and train the multi-modal retinal image classification network. The fusion of modality-specific features allows the model to combine the advantages of fundus and OCT modality for a more accurate diagnosis. Experimental results on a clinically acquired multi-modal retinal image (fundus and OCT) dataset demonstrate that our MSAN outperforms other well-known single-modal and multi-modal retinal image classification methods.
最近,自动诊断方法已被广泛用于对眼部疾病进行分类。这些方法大多基于单一的成像模式(如眼底照相或光学相干断层扫描(OCT)),这些方法通常只能在一定程度上反映眼疾,而忽略了不同成像模式之间的模式特异性信息。本文提出了一种新颖的模态特定注意网络(MSAN),用于多模态视网膜图像分类,该网络可以有效地利用眼底和 OCT 图像中的模态特异性诊断特征。MSAN 由两个注意模块组成,分别从眼底和 OCT 图像中提取模态特异性特征。具体来说,对于眼底图像,眼科医生需要在多个尺度上观察局部和全局病变(例如,从微米级别的微动脉瘤、毫米级别的视盘到整个眼睛的血管)。因此,我们提出了一种多尺度注意模块,用于从眼底图像中提取局部和全局特征。此外,OCT 图像中存在大量的背景区域,这些区域对诊断没有意义。因此,我们提出了一种区域引导注意模块,用于编码与视网膜层相关的特征,并忽略 OCT 图像中的背景。最后,我们融合模态特异性特征,形成多模态特征,并训练多模态视网膜图像分类网络。模态特异性特征的融合使模型能够结合眼底和 OCT 模式的优势,从而进行更准确的诊断。在一个临床采集的多模态视网膜图像(眼底和 OCT)数据集上的实验结果表明,我们的 MSAN 优于其他著名的单模态和多模态视网膜图像分类方法。