Guo Zuwei, Islam Nahid Ul, Gotway Michael B, Liang Jianming
Arizona State University, Tempe, AZ 85281, USA.
Mayo Clinic, Scottsdale, AZ 85259, USA.
Med Image Anal. 2024 Jul;95:103159. doi: 10.1016/j.media.2024.103159. Epub 2024 Apr 16.
We have developed a United framework that integrates three self-supervised learning (SSL) ingredients (discriminative, restorative, and adversarial learning), enabling collaborative learning among the three learning ingredients and yielding three transferable components: a discriminative encoder, a restorative decoder, and an adversary encoder. To leverage this collaboration, we redesigned nine prominent self-supervised methods, including Rotation, Jigsaw, Rubik's Cube, Deep Clustering, TransVW, MoCo, BYOL, PCRL, and Swin UNETR, and augmented each with its missing components in a United framework for 3D medical imaging. However, such a United framework increases model complexity, making 3D pretraining difficult. To overcome this difficulty, we propose stepwise incremental pretraining, a strategy that unifies the pretraining, in which a discriminative encoder is first trained via discriminative learning, the pretrained discriminative encoder is then attached to a restorative decoder, forming a skip-connected encoder-decoder, for further joint discriminative and restorative learning. Last, the pretrained encoder-decoder is associated with an adversarial encoder for final full discriminative, restorative, and adversarial learning. Our extensive experiments demonstrate that the stepwise incremental pretraining stabilizes United models pretraining, resulting in significant performance gains and annotation cost reduction via transfer learning in six target tasks, ranging from classification to segmentation, across diseases, organs, datasets, and modalities. This performance improvement is attributed to the synergy of the three SSL ingredients in our United framework unleashed through stepwise incremental pretraining. Our codes and pretrained models are available at GitHub.com/JLiangLab/StepwisePretraining.
我们开发了一个统一框架,该框架集成了三种自监督学习(SSL)要素(判别式学习、恢复式学习和对抗式学习),实现了这三种学习要素之间的协同学习,并产生了三个可迁移组件:一个判别式编码器、一个恢复式解码器和一个对抗式编码器。为了利用这种协同作用,我们重新设计了九种著名的自监督方法,包括旋转、拼图、魔方、深度聚类、TransVW、动量对比(MoCo)、自监督对比学习(BYOL)、策略对比强化学习(PCRL)和Swin UNETR,并在用于3D医学成像的统一框架中用其缺失的组件对每种方法进行了扩充。然而,这样一个统一框架增加了模型的复杂性,使得3D预训练变得困难。为了克服这一困难,我们提出了逐步增量预训练,这是一种统一预训练的策略,其中首先通过判别式学习训练一个判别式编码器,然后将预训练的判别式编码器连接到一个恢复式解码器上,形成一个跳跃连接的编码器 - 解码器,用于进一步的联合判别式和恢复式学习。最后,将预训练的编码器 - 解码器与一个对抗式编码器关联起来,进行最终的全判别式、恢复式和对抗式学习。我们广泛的实验表明,逐步增量预训练稳定了统一模型的预训练,通过在六个目标任务(从分类到分割,跨越疾病、器官、数据集和模态)中的迁移学习,显著提高了性能并降低了标注成本。这种性能提升归因于通过逐步增量预训练释放的我们统一框架中三种SSL要素的协同作用。我们的代码和预训练模型可在GitHub.com/JLiangLab/StepwisePretraining上获取。