Liu Jinhang, Du Yuhe, Wang Jing, Tang Xing
School of Computer Science, Hubei University of Technology, Wuhan 430070, China.
Key Laboratory of Green Intelligent Computing Network in Hubei Province, Wuhan 430068, China.
Sensors (Basel). 2025 Aug 29;25(17):5357. doi: 10.3390/s25175357.
In semantic segmentation tasks, large kernels and Atrous convolution have been utilized to increase the receptive field, enabling models to achieve competitive performance with fewer parameters. However, due to the fixed size of kernel functions, networks incorporating large convolutional kernels are limited in adaptively capturing multi-scale features and fail to effectively leverage global contextual information. To address this issue, we combine Atrous convolution with large kernel convolution, using different dilation rates to compensate for the single-scale receptive field limitation of large kernels. Simultaneously, we employ a dynamic selection mechanism to adaptively highlight the most important spatial features based on global information. Additionally, to enhance the model's ability to fit the true label distribution, we propose a Multi-Scale Contextual Noise Transfer Matrix (NTM), which uses high-order consistency information from neighborhood representations to estimate NTM and correct supervision signals, thereby improving the model's generalization capability. Extensive experiments conducted on Cityscapes, ADE20K, and COCO-Stuff-10K demonstrate that this approach achieves a new state-of-the-art balance between speed and accuracy. Specifically, LKNTNet achieves 80.05% mIoU on Cityscapes with an inference speed of 80.7 FPS and 42.7% mIoU on ADE20K with an inference speed of 143.6 FPS.
在语义分割任务中,大内核和空洞卷积已被用于扩大感受野,使模型能够用更少的参数实现有竞争力的性能。然而,由于内核函数的大小固定,包含大卷积内核的网络在自适应捕捉多尺度特征方面受到限制,并且无法有效利用全局上下文信息。为了解决这个问题,我们将空洞卷积与大内核卷积相结合,使用不同的扩张率来弥补大内核单尺度感受野的局限性。同时,我们采用动态选择机制,基于全局信息自适应地突出最重要的空间特征。此外,为了增强模型拟合真实标签分布的能力,我们提出了一种多尺度上下文噪声转移矩阵(NTM),它利用邻域表示中的高阶一致性信息来估计NTM并校正监督信号,从而提高模型的泛化能力。在Cityscapes、ADE20K和COCO-Stuff-10K上进行的大量实验表明,这种方法在速度和准确性之间实现了新的最优平衡。具体而言,LKNTNet在Cityscapes上达到了80.05%的平均交并比,推理速度为80.7 FPS,在ADE20K上达到了42.7%的平均交并比,推理速度为143.6 FPS。