Qian Jin, Tao Lei, Gong Changhao, Xu Jun, Luo Yuemei
Jiangsu Key Laboratory of Intelligent Medical Image Computing (IMIC), School of Artificial Intelligence, Nanjing University of Information Science and Technology, 210044 Nanjing, China.
Biomed Opt Express. 2025 May 13;16(6):2312-2326. doi: 10.1364/BOE.558731. eCollection 2025 Jun 1.
Recent advancements have seen a significant focus on using deep neural networks for classifying retinal diseases in optical coherence tomography (OCT) images. However, traditional deep neural networks treat images as grid or sequential structures, limiting their flexibility in capturing irregular and complex objects, resulting in suboptimal performance in practical applications. To address this issue, we propose a novel visual neural network model with a pyramid structure, called pyramid vision graph convolutional networks (PVGCN). This model enhances the correlations between structures by segmenting images into multiple nodes and connecting the nearest nodes. Specifically, it consists of two core components: 1) vision graph block and 2) pyramid structure. The vision graph block, composed of a grapher block and a feed-forward network (FFN), uses graph convolution methods to divide the image into multiple regions, treating them as nodes and representing the image as graph data. The graph constructed based on nodes can capture relationships between nodes without positional restrictions, better representing the irregular structure of retinal tissue. The FFN module improves the over-smoothing phenomenon in the grapher stage, enabling more accurate classification. The pyramid structure decomposes OCT images into a series of sub-images at different scales, integrating features at different scales to obtain a comprehensive feature representation of retinal hierarchical structure information. This structure can replace the extraction of higher-dimensional features in a large model by integrating features at different scales, significantly reducing the number of parameters. We conducted extensive experiments on two different datasets. The experimental results show that the proposed PVGCN achieved accuracies of 0.9954 and 0.9787 on the two datasets, respectively, surpassing existing methods. Additionally, the model demonstrated recognition capabilities comparable to those of human experts in the experiments, effectively identifying retinal diseases in OCT images.
近年来的进展显著聚焦于使用深度神经网络对光学相干断层扫描(OCT)图像中的视网膜疾病进行分类。然而,传统的深度神经网络将图像视为网格或序列结构,限制了它们捕捉不规则和复杂物体的灵活性,导致在实际应用中的性能欠佳。为了解决这个问题,我们提出了一种具有金字塔结构的新型视觉神经网络模型,称为金字塔视觉图卷积网络(PVGCN)。该模型通过将图像分割成多个节点并连接最近的节点来增强结构之间的相关性。具体而言,它由两个核心组件组成:1)视觉图块和2)金字塔结构。视觉图块由一个绘图块和一个前馈网络(FFN)组成,使用图卷积方法将图像划分为多个区域,将它们视为节点并将图像表示为图数据。基于节点构建的图可以捕捉节点之间的关系而不受位置限制,更好地表示视网膜组织的不规则结构。FFN模块改善了绘图阶段的过平滑现象,实现更准确的分类。金字塔结构将OCT图像分解为一系列不同尺度的子图像,整合不同尺度的特征以获得视网膜层次结构信息的全面特征表示。这种结构可以通过整合不同尺度的特征来替代大型模型中高维特征的提取,显著减少参数数量。我们在两个不同的数据集上进行了广泛的实验。实验结果表明,所提出的PVGCN在两个数据集上分别达到了0.9954和0.9787的准确率,超过了现有方法。此外,该模型在实验中表现出与人类专家相当的识别能力,能够有效地识别OCT图像中的视网膜疾病。