IEEE Trans Med Imaging. 2020 Jul;39(7):2531-2540. doi: 10.1109/TMI.2020.2973595. Epub 2020 Feb 12.
Recent advances in deep learning for medical image segmentation demonstrate expert-level accuracy. However, application of these models in clinically realistic environments can result in poor generalization and decreased accuracy, mainly due to the domain shift across different hospitals, scanner vendors, imaging protocols, and patient populations etc. Common transfer learning and domain adaptation techniques are proposed to address this bottleneck. However, these solutions require data (and annotations) from the target domain to retrain the model, and is therefore restrictive in practice for widespread model deployment. Ideally, we wish to have a trained (locked) model that can work uniformly well across unseen domains without further training. In this paper, we propose a deep stacked transformation approach for domain generalization. Specifically, a series of n stacked transformations are applied to each image during network training. The underlying assumption is that the "expected" domain shift for a specific medical imaging modality could be simulated by applying extensive data augmentation on a single source domain, and consequently, a deep model trained on the augmented "big" data (BigAug) could generalize well on unseen domains. We exploit four surprisingly effective, but previously understudied, image-based characteristics for data augmentation to overcome the domain generalization problem. We train and evaluate the BigAug model (with n=9 transformations) on three different 3D segmentation tasks (prostate gland, left atrial, left ventricle) covering two medical imaging modalities (MRI and ultrasound) involving eight publicly available challenge datasets. The results show that when training on relatively small dataset (n = 10~32 volumes, depending on the size of the available datasets) from a single source domain: (i) BigAug models degrade an average of 11%(Dice score change) from source to unseen domain, substantially better than conventional augmentation (degrading 39%) and CycleGAN-based domain adaptation method (degrading 25%), (ii) BigAug is better than "shallower" stacked transforms (i.e. those with fewer transforms) on unseen domains and demonstrates modest improvement to conventional augmentation on the source domain, (iii) after training with BigAug on one source domain, performance on an unseen domain is similar to training a model from scratch on that domain when using the same number of training samples. When training on large datasets (n = 465 volumes) with BigAug, (iv) application to unseen domains reaches the performance of state-of-the-art fully supervised models that are trained and tested on their source domains. These findings establish a strong benchmark for the study of domain generalization in medical imaging, and can be generalized to the design of highly robust deep segmentation models for clinical deployment.
深度学习在医学图像分割方面的最新进展展示了专家级别的准确性。然而,这些模型在临床实际环境中的应用可能会导致泛化能力差和准确性降低,主要是由于不同医院、扫描仪供应商、成像协议和患者群体等方面的领域转移。目前提出了常见的迁移学习和领域自适应技术来解决这一瓶颈。然而,这些解决方案需要目标域的数据(和注释)来重新训练模型,因此在实际应用中对于广泛的模型部署具有一定的局限性。理想情况下,我们希望拥有一个经过训练(锁定)的模型,可以在无需进一步训练的情况下在各个未见过的领域中均匀地工作。在本文中,我们提出了一种用于领域泛化的深度堆叠变换方法。具体来说,在网络训练期间,对每个图像应用一系列 n 个堆叠变换。其基本假设是,可以通过在单个源域上应用广泛的数据增强来模拟特定医学成像模式的“预期”领域转移,因此,在增强的“大数据”(BigAug)上训练的深度模型可以很好地泛化到未见过的领域。我们利用四种非常有效但以前研究较少的基于图像的特征进行数据增强,以克服领域泛化问题。我们在三个不同的 3D 分割任务(前列腺、左心房、左心室)上训练和评估 BigAug 模型(n=9 个变换),涵盖两种医学成像模式(MRI 和超声),涉及八个公开可用的挑战数据集。结果表明,当在单个源域上训练相对较小的数据集(n=10~32 个体积,具体取决于可用数据集的大小)时:(i)BigAug 模型的源域到未见过的域的平均下降 11%(Dice 得分变化),明显优于传统的增强(下降 39%)和基于 CycleGAN 的领域自适应方法(下降 25%);(ii)BigAug 在未见过的域上优于“更浅”的堆叠变换(即变换较少的那些),并且在源域上对传统增强略有改进;(iii)在一个源域上使用 BigAug 进行训练后,在未见域上的性能与在该域上从头开始训练模型的性能相似,使用的训练样本数量相同;(iv)当在具有 BigAug 的大型数据集(n=465 个卷)上进行训练时,应用于未见的域达到了在其源域上进行训练和测试的最先进的全监督模型的性能。这些发现为医学图像领域泛化的研究建立了一个强大的基准,并可推广到临床部署的高度稳健的深度分割模型的设计中。