Wu Jiaqi, Peng Wei, Li Binxu, Zhang Yu, Pohl Kilian M
Stanford University, Stanford, CA 94305.
Lehigh University, Bethlehem, PA 18015.
Med Image Comput Comput Assist Interv. 2024 Oct;15010:297-307. doi: 10.1007/978-3-031-72117-5_28. Epub 2024 Oct 3.
Deep learning models generating structural brain MRIs have the potential to significantly accelerate discovery of neuroscience studies. However, their use has been limited in part by the way their quality is evaluated. Most evaluations of generative models focus on metrics originally designed for natural images (such as structural similarity index and Fréchet inception distance). As we show in a comparison of 6 state-of-the-art generative models trained and tested on over 3000 MRIs, these metrics are sensitive to the experimental setup and inadequately assess how well brain MRIs capture macrostructural properties of brain regions (a.k.a., anatomical plausibility). This shortcoming of the metrics results in inconclusive findings even when qualitative differences between the outputs of models are evident. We therefore propose a framework for evaluating models generating brain MRIs, which requires uniform processing of the real MRIs, standardizing the implementation of the models, and automatically segmenting the MRIs generated by the models. The segmentations are used for quantifying the plausibility of anatomy displayed in the MRIs. To ensure meaningful quantification, it is crucial that the segmentations are highly reliable. Our framework rigorously checks this reliability, a step often overlooked by prior work. Only 3 of the 6 generative models produced MRIs, of which at least 95% had highly reliable segmentations. More importantly, the assessment of each model by our framework is in line with qualitative assessments, reinforcing the validity of our approach. The code of this framework is available via https://github.com/jiaqiw01/MRIAnatEval.git.
生成大脑结构磁共振成像(MRI)的深度学习模型有潜力显著加速神经科学研究的发现。然而,它们的应用在一定程度上受到质量评估方式的限制。大多数生成模型的评估集中在最初为自然图像设计的指标上(如结构相似性指数和弗雷歇 inception 距离)。正如我们在对 6 种最先进的生成模型进行比较时所展示的那样,这些模型在 3000 多张 MRI 图像上进行训练和测试,这些指标对实验设置敏感,并且不能充分评估大脑 MRI 对脑区宏观结构特性的捕捉程度(即解剖学合理性)。即使模型输出之间的定性差异很明显,这些指标的这一缺点也会导致结论不明确。因此,我们提出了一个评估生成大脑 MRI 模型的框架,该框架要求对真实的 MRI 进行统一处理,对模型的实现进行标准化,并自动分割模型生成的 MRI。分割用于量化 MRI 中显示的解剖结构的合理性。为确保有意义的量化,分割高度可靠至关重要。我们的框架严格检查这种可靠性,这是先前工作经常忽略的一步。6 个生成模型中只有 3 个生成了 MRI,其中至少 95%具有高度可靠的分割。更重要的是,我们的框架对每个模型的评估与定性评估一致,加强了我们方法的有效性。该框架的代码可通过 https://github.com/jiaqiw01/MRIAnatEval.git 获取。