Xu Dan, Alameda-Pineda Xavier, Ouyang Wanli, Ricci Elisa, Wang Xiaogang, Sebe Nicu
IEEE Trans Pattern Anal Mach Intell. 2022 May;44(5):2673-2688. doi: 10.1109/TPAMI.2020.3043781. Epub 2022 Apr 1.
Multi-scale representations deeply learned via convolutional neural networks have shown tremendous importance for various pixel-level prediction problems. In this paper we present a novel approach that advances the state of the art on pixel-level prediction in a fundamental aspect, i.e. structured multi-scale features learning and fusion. In contrast to previous works directly considering multi-scale feature maps obtained from the inner layers of a primary CNN architecture, and simply fusing the features with weighted averaging or concatenation, we propose a probabilistic graph attention network structure based on a novel Attention-Gated Conditional Random Fields (AG-CRFs) model for learning and fusing multi-scale representations in a principled manner. In order to further improve the learning capacity of the network structure, we propose to exploit feature dependant conditional kernels within the deep probabilistic framework. Extensive experiments are conducted on four publicly available datasets (i.e. BSDS500, NYUD-V2, KITTI and Pascal-Context) and on three challenging pixel-wise prediction problems involving both discrete and continuous labels (i.e. monocular depth estimation, object contour prediction and semantic segmentation). Quantitative and qualitative results demonstrate the effectiveness of the proposed latent AG-CRF model and the overall probabilistic graph attention network with feature conditional kernels for structured feature learning and pixel-wise prediction.
通过卷积神经网络深度学习的多尺度表示已在各种像素级预测问题中显示出巨大的重要性。在本文中,我们提出了一种新颖的方法,该方法在一个基本方面推进了像素级预测的技术水平,即结构化多尺度特征学习与融合。与先前直接考虑从主卷积神经网络(CNN)架构的内层获得的多尺度特征图,并简单地通过加权平均或拼接来融合特征的工作不同,我们基于一种新颖的注意力门控条件随机场(AG-CRFs)模型提出了一种概率图注意力网络结构,用于以有原则的方式学习和融合多尺度表示。为了进一步提高网络结构的学习能力,我们建议在深度概率框架内利用特征相关的条件核。我们在四个公开可用的数据集(即BSDS500、NYUD-V2、KITTI和Pascal-Context)以及三个涉及离散和连续标签的具有挑战性的像素级预测问题(即单目深度估计、物体轮廓预测和语义分割)上进行了广泛的实验。定量和定性结果证明了所提出的潜在AG-CRF模型以及带有特征条件核的整体概率图注意力网络在结构化特征学习和像素级预测方面的有效性。