Zhang Yuhan, Ma Xiao, Huang Kun, Li Mingchao, Heng Pheng-Ann
IEEE Trans Med Imaging. 2024 Aug;43(8):2960-2969. doi: 10.1109/TMI.2024.3383827. Epub 2024 Aug 1.
Diabetic retinopathy (DR) is a serious ocular condition that requires effective monitoring and treatment by ophthalmologists. However, constructing a reliable DR grading model remains a challenging and costly task, heavily reliant on high-quality training sets and adequate hardware resources. In this paper, we investigate the knowledge transferability of large-scale pre-trained models (LPMs) to fundus images based on prompt learning to construct a DR grading model efficiently. Unlike full-tuning which fine-tunes all parameters of LPMs, prompt learning only involves a minimal number of additional learnable parameters while achieving a competitive effect as full-tuning. Inspired by visual prompt tuning, we propose Semantic-oriented Visual Prompt Learning (SVPL) to enhance the semantic perception ability for better extracting task-specific knowledge from LPMs, without any additional annotations. Specifically, SVPL assigns a group of learnable prompts for each DR level to fit the complex pathological manifestations and then aligns each prompt group to task-specific semantic space via a contrastive group alignment (CGA) module. We also propose a plug-and-play adapter module, Hierarchical Semantic Delivery (HSD), which allows the semantic transition of prompt groups from shallow to deep layers to facilitate efficient knowledge mining and model convergence. Our extensive experiments on three public DR grading datasets demonstrate that SVPL achieves superior results compared to other transfer tuning and DR grading methods. Further analysis suggests that the generalized knowledge from LPMs is advantageous for constructing the DR grading model on fundus images.
糖尿病视网膜病变(DR)是一种严重的眼部疾病,需要眼科医生进行有效的监测和治疗。然而,构建一个可靠的DR分级模型仍然是一项具有挑战性且成本高昂的任务,严重依赖高质量的训练集和充足的硬件资源。在本文中,我们基于提示学习研究大规模预训练模型(LPM)对眼底图像的知识可转移性,以高效构建DR分级模型。与对LPM的所有参数进行微调的全量微调不同,提示学习只涉及最少数量的额外可学习参数,同时能达到与全量微调相当的效果。受视觉提示微调的启发,我们提出面向语义的视觉提示学习(SVPL),以增强语义感知能力,从而在无需任何额外注释的情况下,更好地从LPM中提取特定任务的知识。具体而言,SVPL为每个DR级别分配一组可学习的提示,以适应复杂的病理表现,然后通过对比组对齐(CGA)模块将每个提示组对齐到特定任务的语义空间。我们还提出了一个即插即用的适配器模块——分层语义传递(HSD),它允许提示组从浅层到深层进行语义转换,以促进高效的知识挖掘和模型收敛。我们在三个公开的DR分级数据集上进行的广泛实验表明,与其他迁移微调方法和DR分级方法相比,SVPL取得了更优的结果。进一步分析表明,LPM中的广义知识有利于在眼底图像上构建DR分级模型。