Loos Vincent, Pardasani Rohit, Awasthi Navchetan
University of Amsterdam, Faculty of Science, Mathematics and Computer Science, Informatics Institute, Amsterdam, The Netherlands.
General Electric Healthcare, Bengaluru, Karnataka, India.
J Med Imaging (Bellingham). 2024 Sep;11(5):054004. doi: 10.1117/1.JMI.11.5.054004. Epub 2024 Oct 29.
Medical image segmentation is a critical task in healthcare applications, and U-Nets have demonstrated promising results in this domain. We delve into the understudied aspect of receptive field (RF) size and its impact on the U-Net and attention U-Net architectures used for medical imaging segmentation.
We explore several critical elements including the relationship among RF size, characteristics of the region of interest, and model performance, as well as the balance between RF size and computational costs for U-Net and attention U-Net methods for different datasets. We also propose a mathematical notation for representing the theoretical receptive field (TRF) of a given layer in a network and propose two new metrics, namely, the effective receptive field (ERF) rate and the object rate, to quantify the fraction of significantly contributing pixels within the ERF against the TRF area and assessing the relative size of the segmentation object compared with the TRF size, respectively.
The results demonstrate that there exists an optimal TRF size that successfully strikes a balance between capturing a wider global context and maintaining computational efficiency, thereby optimizing model performance. Interestingly, a distinct correlation is observed between the data complexity and the required TRF size; segmentation based solely on contrast achieved peak performance even with smaller TRF sizes, whereas more complex segmentation tasks necessitated larger TRFs. Attention U-Net models consistently outperformed their U-Net counterparts, highlighting the value of attention mechanisms regardless of TRF size.
These insights present an invaluable resource for developing more efficient U-Net-based architectures for medical imaging and pave the way for future exploration of other segmentation architectures. A tool is also developed, which calculates the TRF for a U-Net (and attention U-Net) model and also suggests an appropriate TRF size for a given model and dataset.
医学图像分割是医疗保健应用中的一项关键任务,而U-Net在该领域已展现出有前景的成果。我们深入研究感受野(RF)大小这一未被充分研究的方面及其对用于医学成像分割的U-Net和注意力U-Net架构的影响。
我们探索了几个关键要素,包括RF大小、感兴趣区域的特征与模型性能之间的关系,以及不同数据集的U-Net和注意力U-Net方法在RF大小与计算成本之间的平衡。我们还提出了一种数学表示法来表示网络中给定层的理论感受野(TRF),并提出了两个新指标,即有效感受野(ERF)率和目标率,分别用于量化ERF内对TRF区域有显著贡献的像素比例,以及评估分割对象相对于TRF大小的相对大小。
结果表明,存在一个最佳的TRF大小,它能在捕捉更广泛的全局上下文和保持计算效率之间成功取得平衡,从而优化模型性能。有趣的是,在数据复杂度和所需的TRF大小之间观察到了明显的相关性;仅基于对比度的分割即使在较小的TRF大小下也能达到最佳性能,而更复杂的分割任务则需要更大的TRF。注意力U-Net模型始终优于其对应的U-Net模型,凸显了注意力机制的价值,而与TRF大小无关。
这些见解为开发更高效的基于U-Net的医学成像架构提供了宝贵资源,并为未来探索其他分割架构铺平了道路。还开发了一个工具,它可以计算U-Net(和注意力U-Net)模型的TRF,并为给定的模型和数据集建议合适的TRF大小。