School of Artificial Intelligence, Chongqing University of Technology, Chongqing, 401135, China.
College of Computer and Information Science, Chongqing Normal University, Chongqing, 401331, China.
Sci Rep. 2023 Jan 2;13(1):76. doi: 10.1038/s41598-022-27358-6.
Early detection of lesions is of great significance for treating fundus diseases. Fundus photography is an effective and convenient screening technique by which common fundus diseases can be detected. In this study, we use color fundus images to distinguish among multiple fundus diseases. Existing research on fundus disease classification has achieved some success through deep learning techniques, but there is still much room for improvement in model evaluation metrics using only deep convolutional neural network (CNN) architectures with limited global modeling ability; the simultaneous diagnosis of multiple fundus diseases still faces great challenges. Therefore, given that the self-attention (SA) model with a global receptive field may have robust global-level feature modeling ability, we propose a multistage fundus image classification model MBSaNet which combines CNN and SA mechanism. The convolution block extracts the local information of the fundus image, and the SA module further captures the complex relationships between different spatial positions, thereby directly detecting one or more fundus diseases in retinal fundus image. In the initial stage of feature extraction, we propose a multiscale feature fusion stem, which uses convolutional kernels of different scales to extract low-level features of the input image and fuse them to improve recognition accuracy. The training and testing were performed based on the ODIR-5k dataset. The experimental results show that MBSaNet achieves state-of-the-art performance with fewer parameters. The wide range of diseases and different fundus image collection conditions confirmed the applicability of MBSaNet.
早期发现病变对于治疗眼底疾病具有重要意义。眼底摄影是一种有效且方便的筛查技术,可用于检测常见的眼底疾病。在本研究中,我们使用彩色眼底图像来区分多种眼底疾病。现有的眼底疾病分类研究通过深度学习技术已经取得了一些成功,但仅使用具有有限全局建模能力的深度卷积神经网络(CNN)架构,在模型评估指标方面仍有很大的改进空间;同时诊断多种眼底疾病仍然面临巨大挑战。因此,鉴于具有全局感受野的自注意力(SA)模型可能具有强大的全局特征建模能力,我们提出了一种结合 CNN 和 SA 机制的多阶段眼底图像分类模型 MBSaNet。卷积块提取眼底图像的局部信息,SA 模块进一步捕捉不同空间位置之间的复杂关系,从而直接检测视网膜眼底图像中的一种或多种眼底疾病。在特征提取的初始阶段,我们提出了一种多尺度特征融合主干,该主干使用不同尺度的卷积核提取输入图像的底层特征,并将它们融合以提高识别准确性。在 ODIR-5k 数据集上进行了训练和测试。实验结果表明,MBSaNet 在具有较少参数的情况下实现了最先进的性能。广泛的疾病和不同的眼底图像采集条件证实了 MBSaNet 的适用性。