Zhou Wujie, Xiao Yuxiang, Qiang Fangfang, Dong Xiena, Xu Caie, Yu Lu
School of Information & Electronic Engineering, Zhejiang University of Science & Technology, Hangzhou 310023, China; School of Computer Science and Engineering, Nanyang Technological University, Singapore 308232, Singapore.
School of Information & Electronic Engineering, Zhejiang University of Science & Technology, Hangzhou 310023, China.
Neural Netw. 2025 Aug;188:107438. doi: 10.1016/j.neunet.2025.107438. Epub 2025 Mar 25.
Recent advances in deep learning for semantic segmentation models have introduced dynamic segmentation methods as opposed to static segmentation methods represented by full convolutional networks. Dynamic prediction methods replace static classifiers with learnable class embeddings to achieve global semantic awareness. Although dynamic methods excel in accuracy, the learning and inference of class embeddings is usually accompanied by a tedious computational burden. To address this challenge, we propose an affinity-enhanced semantic segmentation framework that synergistically combines the strengths of static and dynamic methodologies. Specifically, our approach leverages semantic features to obtain preliminary static segmentation results and constructs a binary affinity matrix that explicitly encodes pixel-wise category relationships. This affinity matrix serves as a dynamic classification kernel, effectively integrating global context awareness with static features, achieving comparable performance to purely dynamic approaches but with a substantially reduced computational overhead. Furthermore, we introduce a novel feature-to-category mapping refinement technique. This technique performs feature knowledge migration by learning a linear transformation between the semantic feature space and the segmentation probability space, resulting in improved accuracy without increasing model complexity. Numerous experiments demonstrated that the proposed method achieves the best performance on the widely used NYUv2 and SUN-RGBD datasets. And the effectiveness of our method in different scenes is verified on the outdoor scene dataset CamVid.
深度学习语义分割模型的最新进展引入了动态分割方法,以区别于全卷积网络所代表的静态分割方法。动态预测方法用可学习的类别嵌入取代静态分类器,以实现全局语义感知。尽管动态方法在准确性方面表现出色,但类别嵌入的学习和推理通常伴随着繁重的计算负担。为应对这一挑战,我们提出了一种亲和力增强的语义分割框架,该框架协同结合了静态和动态方法的优势。具体而言,我们的方法利用语义特征来获得初步的静态分割结果,并构建一个二元亲和力矩阵,该矩阵明确编码逐像素的类别关系。这个亲和力矩阵充当动态分类内核,有效地将全局上下文感知与静态特征整合在一起,在性能上与纯动态方法相当,但计算开销大幅降低。此外,我们引入了一种新颖的特征到类别映射细化技术。该技术通过学习语义特征空间和分割概率空间之间的线性变换来执行特征知识迁移,在不增加模型复杂度的情况下提高了准确性。大量实验表明,所提出的方法在广泛使用的NYUv2和SUN-RGBD数据集上取得了最佳性能。并且我们的方法在室外场景数据集CamVid上验证了其在不同场景中的有效性。