Department of Medical Biophysics, University of Toronto, Toronto, Canada; Hurvitz Brain Sciences Research Program, Sunnybrook Research Institute, Toronto, Canada; Physical Sciences, Sunnybrook Research Institute, Toronto, Canada.
Hurvitz Brain Sciences Research Program, Sunnybrook Research Institute, Toronto, Canada; Physical Sciences, Sunnybrook Research Institute, Toronto, Canada; Canadian Partnership for Stroke Recovery, Heart and Stroke Foundation, Toronto, Canada.
Neuroimage. 2023 Sep;278:120289. doi: 10.1016/j.neuroimage.2023.120289. Epub 2023 Jul 24.
Deep artificial neural networks (DNNs) have moved to the forefront of medical image analysis due to their success in classification, segmentation, and detection challenges. A principal challenge in large-scale deployment of DNNs in neuroimage analysis is the potential for shifts in signal-to-noise ratio, contrast, resolution, and presence of artifacts from site to site due to variances in scanners and acquisition protocols. DNNs are famously susceptible to these distribution shifts in computer vision. Currently, there are no benchmarking platforms or frameworks to assess the robustness of new and existing models to specific distribution shifts in MRI, and accessible multi-site benchmarking datasets are still scarce or task-specific. To address these limitations, we propose ROOD-MRI: a novel platform for benchmarking the Robustness of DNNs to Out-Of-Distribution (OOD) data, corruptions, and artifacts in MRI. This flexible platform provides modules for generating benchmarking datasets using transforms that model distribution shifts in MRI, implementations of newly derived benchmarking metrics for image segmentation, and examples for using the methodology with new models and tasks. We apply our methodology to hippocampus, ventricle, and white matter hyperintensity segmentation in several large studies, providing the hippocampus dataset as a publicly available benchmark. By evaluating modern DNNs on these datasets, we demonstrate that they are highly susceptible to distribution shifts and corruptions in MRI. We show that while data augmentation strategies can substantially improve robustness to OOD data for anatomical segmentation tasks, modern DNNs using augmentation still lack robustness in more challenging lesion-based segmentation tasks. We finally benchmark U-Nets and vision transformers, finding robustness susceptibility to particular classes of transforms across architectures. The presented open-source platform enables generating new benchmarking datasets and comparing across models to study model design that results in improved robustness to OOD data and corruptions in MRI.
深度人工神经网络 (DNN) 由于在分类、分割和检测挑战方面的成功,已成为医学图像分析的前沿。在神经影像学中大规模部署 DNN 的主要挑战之一是由于扫描仪和采集协议的差异,信号与噪声比、对比度、分辨率以及伪影的存在可能会发生变化。DNN 在计算机视觉中非常容易受到这些分布变化的影响。目前,还没有基准测试平台或框架来评估新模型和现有模型对 MRI 中特定分布变化的鲁棒性,并且可访问的多站点基准测试数据集仍然稀缺或特定于任务。为了解决这些限制,我们提出了 ROOD-MRI:一个用于基准测试 DNN 对 MRI 中异常数据、损坏和伪影的稳健性的新平台。这个灵活的平台提供了使用变换生成基准数据集的模块,这些变换可以模拟 MRI 中的分布变化,用于图像分割的新基准指标的实现,以及使用新模型和任务的方法示例。我们将我们的方法应用于几个大型研究中的海马体、脑室和脑白质高信号分割,提供了一个公共可用的海马体数据集作为基准。通过在这些数据集上评估现代 DNN,我们证明它们对 MRI 中的分布变化和损坏非常敏感。我们表明,虽然数据增强策略可以大大提高解剖分割任务对异常数据的鲁棒性,但使用增强的现代 DNN 在更具挑战性的基于病变的分割任务中仍然缺乏鲁棒性。我们最后基准测试 U-Nets 和视觉转换器,发现跨架构的特定类变换的稳健性易感性。所提出的开源平台能够生成新的基准数据集,并在模型之间进行比较,以研究导致对 MRI 中异常数据和损坏更鲁棒的模型设计。