Zhang Pan, Chen Ming, Gao Meng
College of Information, Shanghai Ocean University, No. 999 Hucheng Ring Road, Shanghai 201306, China.
Sensors (Basel). 2024 Apr 12;24(8):2473. doi: 10.3390/s24082473.
Leveraging data from various modalities to enhance multimodal segmentation tasks is a well-regarded approach. Recently, efforts have been made to incorporate an array of modalities, including depth and thermal imaging. Nevertheless, the effective amalgamation of cross-modal interactions remains a challenge, given the unique traits each modality presents. In our current research, we introduce the semantic guidance fusion network (SGFN), which is an innovative cross-modal fusion network adept at integrating a diverse set of modalities. Particularly, the SGFN features a semantic guidance module (SGM) engineered to boost bi-modal feature extraction. It encompasses a learnable semantic guidance convolution (SGC) designed to merge intensity and gradient data from disparate modalities. Comprehensive experiments carried out on the NYU Depth V2, SUN-RGBD, Cityscapes, MFNet, and ZJU datasets underscore both the superior performance and generalization ability of the SGFN compared to the current leading models. Moreover, when tested on the DELIVER dataset, the efficiency of our bi-modal SGFN displayed a mIoU that is comparable to the hitherto leading model, CMNEXT.
利用来自各种模态的数据来增强多模态分割任务是一种备受认可的方法。最近,人们已努力纳入一系列模态,包括深度和热成像。然而,鉴于每种模态所呈现的独特特性,跨模态交互的有效融合仍然是一项挑战。在我们当前的研究中,我们引入了语义引导融合网络(SGFN),这是一种创新的跨模态融合网络,擅长整合各种不同的模态。特别地,SGFN具有一个语义引导模块(SGM),其设计目的是促进双模态特征提取。它包含一个可学习的语义引导卷积(SGC),旨在合并来自不同模态的强度和梯度数据。在纽约大学深度V2、SUN - RGBD、城市景观、MFNet和浙江大学数据集上进行的综合实验强调了SGFN与当前领先模型相比具有卓越的性能和泛化能力。此外,在DELIVER数据集上进行测试时,我们的双模态SGFN的效率显示出与迄今为止领先的模型CMNEXT相当的平均交并比(mIoU)。