Zhen Baochen, Qi Yongbin, Tang Zizhen, Liu Chaoyong, Zhao Shilin, Yu Yansuo, Liu Qiang
Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing, 102617, China.
School of Mechanical Engineering, Beijing Institute of Petrochemical Technology, Beijing, 102617, China.
J Imaging Inform Med. 2025 Apr 29. doi: 10.1007/s10278-025-01513-7.
Age-related macular degeneration (AMD) is a prevalent retinal degenerative disease among the elderly and is a major cause of irreversible vision loss worldwide. Although color fundus photography (CFP) and optical coherence tomography (OCT) are widely used for AMD diagnosis, information from a single modal is inadequate to fully capture the complex pathological features of AMD. To address this, this study proposes an innovative multi-modal deep learning framework that fine-tunes pre-trained single-modal retinal models for efficient application in multi-modal AMD categorization tasks. Specifically, two independent vision transformer models are used to extract features from CFP and OCT images, followed by deep canonical correlation analysis (DCCA) to perform nonlinear mapping and fusion of features from both modalities, maximizing cross-modal feature correlation. Moreover, to reduce the computational complexity of multi-modal integration, we introduce the low-rank adaptation (LoRA) technique, which uses low-rank decomposition of parameter matrices, achieving superior performance compared to full fine-tuning with only about 0.49% of the trainable parameters. Experimental results on the public dataset MMC-AMD validate the framework's effectiveness. The proposed model achieves an overall F1-score of 0.948, AUC-ROC of 0.991, and accuracy of 0.949, significantly outperforming existing single-modal and multi-modal baseline models, particularly excelling in recognizing complex pathological categories.
年龄相关性黄斑变性(AMD)是老年人中常见的视网膜退行性疾病,也是全球不可逆视力丧失的主要原因。尽管彩色眼底摄影(CFP)和光学相干断层扫描(OCT)被广泛用于AMD诊断,但单一模态的信息不足以完全捕捉AMD复杂的病理特征。为了解决这个问题,本研究提出了一种创新的多模态深度学习框架,该框架对预训练的单模态视网膜模型进行微调,以便在多模态AMD分类任务中高效应用。具体来说,使用两个独立的视觉Transformer模型从CFP和OCT图像中提取特征,然后进行深度典型相关分析(DCCA),以对来自两种模态的特征进行非线性映射和融合,最大化跨模态特征相关性。此外,为了降低多模态集成的计算复杂性,我们引入了低秩自适应(LoRA)技术,该技术使用参数矩阵的低秩分解,与仅使用约0.49%的可训练参数进行完全微调相比,性能更优。在公共数据集MMC-AMD上的实验结果验证了该框架的有效性。所提出的模型实现了0.948的总体F1分数、0.991的AUC-ROC和0.949的准确率,显著优于现有的单模态和多模态基线模型,尤其在识别复杂病理类别方面表现出色。