Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
Pengcheng Laboratory, Shenzhen, China.
Nat Commun. 2024 Sep 2;15(1):7620. doi: 10.1038/s41467-024-51749-0.
Recently, multi-modal vision-language foundation models have gained significant attention in the medical field. While these models offer great opportunities, they still face crucial challenges, such as the requirement for fine-grained knowledge understanding in computer-aided diagnosis and the capability of utilizing very limited or even no task-specific labeled data in real-world clinical applications. In this study, we present MaCo, a masked contrastive chest X-ray foundation model that tackles these challenges. MaCo explores masked contrastive learning to simultaneously achieve fine-grained image understanding and zero-shot learning for a variety of medical imaging tasks. It designs a correlation weighting mechanism to adjust the correlation between masked chest X-ray image patches and their corresponding reports, thereby enhancing the model's representation learning capabilities. To evaluate the performance of MaCo, we conducted extensive experiments using 6 well-known open-source X-ray datasets. The experimental results demonstrate the superiority of MaCo over 10 state-of-the-art approaches across tasks such as classification, segmentation, detection, and phrase grounding. These findings highlight the significant potential of MaCo in advancing a wide range of medical image analysis tasks.
最近,多模态视觉语言基础模型在医学领域引起了广泛关注。虽然这些模型提供了很多机会,但它们仍然面临着一些关键挑战,例如在计算机辅助诊断中需要精细的知识理解,以及在实际临床应用中利用非常有限甚至没有特定任务标记数据的能力。在这项研究中,我们提出了 MaCo,这是一个用于解决这些挑战的掩蔽对比胸部 X 射线基础模型。MaCo 探索了掩蔽对比学习,以同时实现细粒度的图像理解和零样本学习,适用于各种医学成像任务。它设计了一种相关权重机制来调整掩蔽胸部 X 射线图像补丁与其相应报告之间的相关性,从而增强模型的表示学习能力。为了评估 MaCo 的性能,我们使用了 6 个著名的开源 X 射线数据集进行了广泛的实验。实验结果表明,MaCo 在分类、分割、检测和短语定位等任务上优于 10 种最先进的方法。这些发现突显了 MaCo 在推进广泛的医学图像分析任务方面的巨大潜力。