Suppr超能文献

一种具有噪声传递机制的大型内核卷积神经网络用于实时语义分割

A Large Kernel Convolutional Neural Network with a Noise Transfer Mechanism for Real-Time Semantic Segmentation.

作者信息

Liu Jinhang, Du Yuhe, Wang Jing, Tang Xing

机构信息

School of Computer Science, Hubei University of Technology, Wuhan 430070, China.

Key Laboratory of Green Intelligent Computing Network in Hubei Province, Wuhan 430068, China.

出版信息

Sensors (Basel). 2025 Aug 29;25(17):5357. doi: 10.3390/s25175357.

Abstract

In semantic segmentation tasks, large kernels and Atrous convolution have been utilized to increase the receptive field, enabling models to achieve competitive performance with fewer parameters. However, due to the fixed size of kernel functions, networks incorporating large convolutional kernels are limited in adaptively capturing multi-scale features and fail to effectively leverage global contextual information. To address this issue, we combine Atrous convolution with large kernel convolution, using different dilation rates to compensate for the single-scale receptive field limitation of large kernels. Simultaneously, we employ a dynamic selection mechanism to adaptively highlight the most important spatial features based on global information. Additionally, to enhance the model's ability to fit the true label distribution, we propose a Multi-Scale Contextual Noise Transfer Matrix (NTM), which uses high-order consistency information from neighborhood representations to estimate NTM and correct supervision signals, thereby improving the model's generalization capability. Extensive experiments conducted on Cityscapes, ADE20K, and COCO-Stuff-10K demonstrate that this approach achieves a new state-of-the-art balance between speed and accuracy. Specifically, LKNTNet achieves 80.05% mIoU on Cityscapes with an inference speed of 80.7 FPS and 42.7% mIoU on ADE20K with an inference speed of 143.6 FPS.

摘要

在语义分割任务中,大内核和空洞卷积已被用于扩大感受野,使模型能够用更少的参数实现有竞争力的性能。然而,由于内核函数的大小固定,包含大卷积内核的网络在自适应捕捉多尺度特征方面受到限制,并且无法有效利用全局上下文信息。为了解决这个问题,我们将空洞卷积与大内核卷积相结合,使用不同的扩张率来弥补大内核单尺度感受野的局限性。同时,我们采用动态选择机制,基于全局信息自适应地突出最重要的空间特征。此外,为了增强模型拟合真实标签分布的能力,我们提出了一种多尺度上下文噪声转移矩阵(NTM),它利用邻域表示中的高阶一致性信息来估计NTM并校正监督信号,从而提高模型的泛化能力。在Cityscapes、ADE20K和COCO-Stuff-10K上进行的大量实验表明,这种方法在速度和准确性之间实现了新的最优平衡。具体而言,LKNTNet在Cityscapes上达到了80.05%的平均交并比,推理速度为80.7 FPS,在ADE20K上达到了42.7%的平均交并比,推理速度为143.6 FPS。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/856c/12431489/841e2c86857a/sensors-25-05357-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验