University of Adelaide, Australia.
RIKEN AIP, Japan; RCAST, The University of Tokyo, Japan.
Med Image Anal. 2024 Dec;98:103304. doi: 10.1016/j.media.2024.103304. Epub 2024 Aug 17.
Masked Image Modelling (MIM), a form of self-supervised learning, has garnered significant success in computer vision by improving image representations using unannotated data. Traditional MIMs typically employ a strategy of random sampling across the image. However, this random masking technique may not be ideally suited for medical imaging, which possesses distinct characteristics divergent from natural images. In medical imaging, particularly in pathology, disease-related features are often exceedingly sparse and localized, while the remaining regions appear normal and undifferentiated. Additionally, medical images frequently accompany reports, directly pinpointing pathological changes' location. Inspired by this, we propose Masked medical Image Modelling (MedIM), a novel approach, to our knowledge, the first research that employs radiological reports to guide the masking and restore the informative areas of images, encouraging the network to explore the stronger semantic representations from medical images. We introduce two mutual comprehensive masking strategies, knowledge-driven masking (KDM), and sentence-driven masking (SDM). KDM uses Medical Subject Headings (MeSH) words unique to radiology reports to identify symptom clues mapped to MeSH words (e.g., cardiac, edema, vascular, pulmonary) and guide the mask generation. Recognizing that radiological reports often comprise several sentences detailing varied findings, SDM integrates sentence-level information to identify key regions for masking. MedIM reconstructs images informed by this masking from the KDM and SDM modules, promoting a comprehensive and enriched medical image representation. Our extensive experiments on seven downstream tasks covering multi-label/class image classification, pneumothorax segmentation, and medical image-report analysis, demonstrate that MedIM with report-guided masking achieves competitive performance. Our method substantially outperforms ImageNet pre-training, MIM-based pre-training, and medical image-report pre-training counterparts. Codes are available at https://github.com/YtongXie/MedIM.
掩模图像建模 (MIM) 是一种自监督学习形式,通过使用未注释的数据改进图像表示,在计算机视觉中取得了重大成功。传统的 MIM 通常采用在图像上随机采样的策略。然而,这种随机掩蔽技术对于医学成像来说可能并不理想,因为医学成像具有与自然图像不同的独特特征。在医学成像中,特别是在病理学中,与疾病相关的特征通常非常稀疏和局部化,而其余区域则显得正常且无差别。此外,医学图像通常附有报告,直接指出病理变化的位置。受此启发,我们提出了掩模医学图像建模 (MedIM),这是一种新颖的方法,据我们所知,这是首次利用放射学报告来指导掩蔽和恢复图像的信息区域的研究,鼓励网络从医学图像中探索更强的语义表示。我们引入了两种相互综合的掩蔽策略,知识驱动掩蔽 (KDM) 和句子驱动掩蔽 (SDM)。KDM 使用放射学报告中特有的医学主题词 (MeSH) 词来识别映射到 MeSH 词的症状线索 (例如,心脏、水肿、血管、肺部),并指导掩蔽生成。认识到放射学报告通常包含几个句子,详细描述不同的发现,SDM 整合句子级别的信息来识别掩蔽的关键区域。MedIM 根据 KDM 和 SDM 模块的掩蔽信息重建图像,促进全面而丰富的医学图像表示。我们在七个下游任务上进行了广泛的实验,涵盖了多标签/类图像分类、气胸分割和医学图像报告分析,结果表明,具有报告指导掩蔽的 MedIM 取得了有竞争力的性能。我们的方法在图像分类、气胸分割和医学图像报告分析等七个下游任务上取得了优异的结果,在大多数任务上都显著优于 ImageNet 预训练、MIM 预训练和医学图像报告预训练方法。代码可在 https://github.com/YtongXie/MedIM 上获得。