Remedios Lucas W, Liu Han, Remedios Samuel W, Zuo Lianrui, Saunders Adam M, Bao Shunxing, Huo Yuankai, Powers Alvin C, Virostko John, Landman Bennett A
Vanderbilt University, Department of Computer Science, Nashville, Tennessee, United States.
Johns Hopkins University, Department of Computer Science, Baltimore, Maryland, United States.
J Med Imaging (Bellingham). 2025 Mar;12(2):024008. doi: 10.1117/1.JMI.12.2.024008. Epub 2025 Apr 26.
Combining different types of medical imaging data, through multimodal fusion, promises better segmentation of anatomical structures, such as the pancreas. Strategic implementation of multimodal fusion could improve our ability to study diseases such as diabetes. However, where to perform fusion in deep learning models is still an open question. It is unclear if there is a single best location to fuse information when analyzing pairs of imperfectly aligned images or if the optimal fusion location depends on the specific model being used. Two main challenges when using multiple imaging modalities to study the pancreas are (1) the pancreas and surrounding abdominal anatomy have a deformable structure, making it difficult to consistently align the images and (2) breathing by the individual during image collection further complicates the alignment between multimodal images. Even after using state-of-the-art deformable image registration techniques, specifically designed to align abdominal images, multimodal images of the abdomen are often not perfectly aligned. We examine how the choice of different fusion points, ranging from early in the image processing pipeline to later stages, impacts the segmentation of the pancreas on imperfectly registered multimodal magnetic resonance (MR) images.
Our dataset consists of 353 pairs of T2-weighted (T2w) and T1-weighted (T1w) abdominal MR images from 163 subjects with accompanying pancreas segmentation labels drawn mainly based on the T2w images. Because the T2w images were acquired in an interleaved manner across two breath holds and the T1w images on one breath hold, there were three different breath holds impacting the alignment of each pair of images. We used deeds, a state-of-the-art deformable abdominal image registration method to align the image pairs. Then, we trained a collection of basic UNets with different fusion points, spanning from early to late layers in the model, to assess how early through late fusion influenced segmentation performance on imperfectly aligned images. To investigate whether performance differences on key fusion points are generalized to other architectures, we expanded our experiments to nnUNet.
The single-modality T2w baseline using a basic UNet model had a median Dice score of 0.766, whereas the same baseline on the nnUNet model achieved 0.824. For each fusion approach, we analyzed the differences in performance with Dice residuals, by subtracting the baseline score from the fusion score for each datapoint. For the basic UNet, the best fusion approach was from early/mid fusion and occurred in the middle of the encoder with a median Dice residual of compared with the baseline. For the nnUNet, the best fusion approach was early fusion through naïve image concatenation before the model, with a median Dice residual of compared with the baseline. After Bonferroni correction, the distributions of the Dice scores for these best fusion approaches were found to be statistically significant ( ) via the paired Wilcoxon signed-rank test against the baseline.
Fusion in specific blocks can improve performance, but the best blocks for fusion are model-specific, and the gains are small. In imperfectly registered datasets, fusion is a nuanced problem, with the art of design remaining vital for uncovering potential insights. Future innovation is needed to better address fusion in cases of imperfect alignment of abdominal image pairs. The code associated with this project is available here https://github.com/MASILab/influence_of_fusion_on_pancreas_segmentation.
通过多模态融合结合不同类型的医学成像数据,有望更好地分割诸如胰腺等解剖结构。多模态融合的策略性实施可以提高我们研究糖尿病等疾病的能力。然而,在深度学习模型中何处进行融合仍是一个悬而未决的问题。在分析不完全对齐的图像对时,尚不清楚是否存在一个单一的最佳信息融合位置,或者最佳融合位置是否取决于所使用的特定模型。使用多种成像模态研究胰腺时的两个主要挑战是:(1)胰腺和周围腹部解剖结构具有可变形的结构,使得难以始终如一地对齐图像;(2)个体在图像采集过程中的呼吸进一步使多模态图像之间的对齐复杂化。即使使用专门设计用于对齐腹部图像的最先进的可变形图像配准技术,腹部的多模态图像也常常不能完美对齐。我们研究了从图像处理管道的早期到后期的不同融合点选择如何影响对未完美配准的多模态磁共振(MR)图像上胰腺的分割。
我们的数据集由来自163名受试者的353对T2加权(T2w)和T1加权(T1w)腹部MR图像组成,并带有主要基于T2w图像绘制的胰腺分割标签。由于T2w图像是在两次屏气过程中以交错方式采集的,而T1w图像是在一次屏气过程中采集的,因此有三种不同的屏气方式影响每对图像的对齐。我们使用了deeds,一种最先进的可变形腹部图像配准方法来对齐图像对。然后,我们训练了一组具有不同融合点的基本U-Net,融合点范围从模型的早期层到后期层,以评估从早期到后期融合如何影响未完美对齐图像上的分割性能。为了研究关键融合点上的性能差异是否能推广到其他架构,我们将实验扩展到了nnU-Net。
使用基本U-Net模型的单模态T2w基线的中位Dice分数为0.766,而nnU-Net模型上的相同基线达到了0.824。对于每种融合方法,我们通过从每个数据点的融合分数中减去基线分数,用Dice残差分析性能差异。对于基本U-Net,最佳融合方法是早期/中期融合,发生在编码器中间,与基线相比,中位Dice残差为 。对于nnU-Net,最佳融合方法是在模型之前通过简单图像拼接进行早期融合,与基线相比,中位Dice残差为 。经过Bonferroni校正后,通过针对基线的配对Wilcoxon符号秩检验发现,这些最佳融合方法的Dice分数分布具有统计学意义( )。
在特定块中进行融合可以提高性能,但最佳融合块是特定于模型的,并且增益很小。在未完美配准的数据集中,融合是一个微妙的问题,设计技巧对于揭示潜在见解仍然至关重要。需要未来的创新来更好地解决腹部图像对未完美对齐情况下的融合问题。与该项目相关的代码可在此处获取:https://github.com/MASILab/influence_of_fusion_on_pancreas_segmentation 。