Xie Yutong, Zhang Jianpeng, Xia Yong, Wu Qi
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):10021-10035. doi: 10.1109/TPAMI.2024.3436105. Epub 2024 Nov 6.
Self-supervised learning (SSL) opens up huge opportunities for medical image analysis that is well known for its lack of annotations. However, aggregating massive (unlabeled) 3D medical images like computerized tomography (CT) remains challenging due to its high imaging cost and privacy restrictions. In our pilot study, we advocated bringing a wealth of 2D images like X-rays as compensation for the lack of 3D data, aiming to build a universal medical self-supervised representation learning framework, called UniMiSS. Especially, we designed a pyramid U-like medical Transformer (MiT) as the backbone to make UniMiSS possible to perform SSL with both 2D and 3D images. UniMiSS surpasses current 3D-specific SSL in effectiveness and versatility, excelling in various downstream tasks and overcoming the limitations of dimensionality. However, the initial version did not fully explore the anatomical correlations between 2D and 3D images due to the absence of paired multi-modal patient data. In this extension, we introduce UniMiSS+, which leverages digitally reconstructed radiographs (DRR) technology to simulate X-rays from CT volumes, providing access to paired data. Benefiting from the paired group, we introduce an extra pair-wise constraint to boost the cross modality correlation learning, which also can be adopted as a cross dimension regularization to further improve the representations. We conduct expensive experiments on multiple 3D/2D medical image analysis tasks, including segmentation and classification. The results show that our UniMiSS+ achieves promising performance on various downstream tasks, not only outperforming ImageNet pre-training and other advanced SSL counterparts but also improving the predecessor UniMiSS pre-training.
自监督学习(SSL)为缺乏标注的医学图像分析带来了巨大机遇。然而,由于计算机断层扫描(CT)等大规模(未标注)3D医学图像的成像成本高且存在隐私限制,对其进行聚合仍然具有挑战性。在我们的初步研究中,我们主张引入大量如X光等2D图像来弥补3D数据的不足,旨在构建一个通用的医学自监督表示学习框架,称为UniMiSS。特别是,我们设计了一种类似金字塔U型的医学Transformer(MiT)作为主干,使UniMiSS能够对2D和3D图像执行SSL。UniMiSS在有效性和通用性方面超越了当前特定于3D的SSL,在各种下游任务中表现出色,并克服了维度限制。然而,由于缺乏配对的多模态患者数据,初始版本没有充分探索2D和3D图像之间的解剖学相关性。在此扩展中,我们引入了UniMiSS+,它利用数字重建射线照相(DRR)技术从CT体积模拟X光,提供配对数据。受益于配对组,我们引入了一个额外的成对约束来增强跨模态相关性学习,这也可以用作跨维度正则化来进一步改进表示。我们在多个3D/2D医学图像分析任务上进行了大量实验,包括分割和分类。结果表明,我们的UniMiSS+在各种下游任务中取得了有前景的性能,不仅优于ImageNet预训练和其他先进的SSL对应方法,还改进了前身UniMiSS预训练。